sys-bio / libOmexMeta

libOmexMeta is a library aimed at providing developer-level support for reading, writing, editing and managing semantic annotations for biosimulation models.
https://sys-bio.github.io/libOmexMeta/
Apache License 2.0
8 stars 6 forks source link

Validate identifier and date objects #93

Open jonrkarr opened 3 years ago

jonrkarr commented 3 years ago

libOMEXMeta focuses on specific identifier namespaces. One benefit of this is the ability to check that identifiers are valid. It would be helpful for libOMEXMeta to take advantage of this to weed out mistakes in metadata. This can be simplified by focusing on identifiers.org which provides a registry of regular expressions for identifiers (e.g., rather than https://orcid.org/XXXX-XXXX-XXXX-XXXX).

Similarly, dates would ideally be validated

Editor methods and RDF.to_file don't validate identifiers

from pyomexmeta import RDF
rdf = RDF()
with open('BIOMD0000000075.xml', 'r') as file:
    xml = file.read()
editor = rdf.to_editor(xml)
editor.add_curator('orcid:xyz')
editor.add_date_created('xyz')
editor.add_pubmed('xyz')
rdf.to_file('omex-meta.rdf', 'rdfxml-abbrev')

RDF.from_file doesn't validate identifiers either

<?xml version="1.1" encoding="utf-8"?>
<rdf:RDF
   xmlns:OMEXlib="http://omex-library.org/"
   xmlns:bqmodel="http://biomodels.net/model-qualifiers/"
   xmlns:dc="https://dublincore.org/specifications/dublin-core/dcmi-terms/"
   xmlns:local="http://omex-library.org/NewOmex.omex/NewModel.rdf#"
   xmlns:pubmed="https://identifiers.org/pubmed:"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="http://omex-library.org/NewOmex.omex/NewModel.rdf#">
    <dc:creator rdf:resource="https://orcid.org/orcid/0000-0001-8254-4958"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://omex-library.org/NewOmex.omex/NewModel.xml#COPASI0">
    <bqmodel:isDescribedBy rdf:resource="https://identifiers.org/pubmed:xyz"/>
    <dc:created>
      <rdf:Description>
        <dc:W3CDTF>xyz</dc:W3CDTF>
      </rdf:Description>
    </dc:created>
  </rdf:Description>
</rdf:RDF>
jonrkarr commented 3 years ago

Identifiers.org validator provides some degree of validation of URIs. Maybe it would be possible to get a batch endpoint or file that could be used to do validation more quickly via a single endpoint or locally.