sys-bio / libOmexMeta

libOmexMeta is a library aimed at providing developer-level support for reading, writing, editing and managing semantic annotations for biosimulation models.
https://sys-bio.github.io/libOmexMeta/
Apache License 2.0
8 stars 6 forks source link

What is the convention for defining namespaces for OMEX library and identifiers? #96

Open jonrkarr opened 3 years ago

jonrkarr commented 3 years ago

pyomexmeta declares several namespace prefixes which don't obviously appear to be used in documents.

For example, this

from pyomexmeta import RDF
rdf = RDF()
with open('BIOMD0000000075.xml', 'r') as file:
   xml = file.read()
editor = rdf.to_editor(xml)
editor.add_curator('0000-0000-0000-0000')
editor.add_date_created('2021-06-01')
editor.add_pubmed('1234')
rdf.to_file('omex-meta.rdf', 'rdfxml-abbrev')

declares these namespaces prefixes which don't appear to be used

I assume local is reserved for some purpose that not all document will use.

I have a few questions about the OMEX library and Identifiers.org prefix declarations.

CiaranWelsh commented 3 years ago

In general, the handling of namespaces is a little awkward at the moment in libOmexMeta and could almost certainly be improved - I suspect by something like the observer pattern. Whenever a new uri is added, a namespace map is searched. If a match is found, the namespace and prefix are added to a subset of "seen" namespaces which are stored in the Editor object. These namespaces need to passed to the librdf_serializer via LibrdfSerializer::setNamespace which gets created only when needed inside the RDF::toString. The repository and local uri's are always added as namespaces before serialization. At this time, modifying the behaviour of (e.g.) local uri to only get used when needed may be quite difficult and require design changes.

The pubmed namespace was added because you called add_pubmed

See the spec section 2.1 for notes on local uri.

As for the other questions, I'll need to call in the experts @jhgennari @nickerso .

jonrkarr commented 3 years ago

My question comes from an XML perspective. In XML, namespace prefixes are typically used in conjunction with XML tags and XML attributes. pyomexmeta is exporting namespace prefixes for values of XML attributes as well (e.g., the PubMed namespace). As I understand, these additional namespaces will not be used by other XML and RDF libraries (e.g., rdflib) because the prefixes are never used.

Given that these additional namespaces are not required to define well-formed RDF, if these additional namespace declarations are desired, I think this needs to be clarified in the OMEX metadata specifications. Section 2.1 discusses related things, but doesn't clearly address this issue.

Additionally, what is the convention for identifiers.org? Is http://identifiers.org the namespace, or the (sub)-namespaces registered in with identifiers.org (e.g., http://identifiers.org/pubmed:)? The later seems more useful to me, but I can see how this could be a source of confusion since both are namespaces. In addition, pyomexmeta uses namespace prefixes that terminate with the atypical : rather than the typical / or #. This terminal : can be replaced with / (e.g., https://identifiers.org/pubmed/12334 rather than https://identifiers.org/pubmed:12334), as identifiers.org recognizes both forms.

nickerso commented 3 years ago

Perhaps to answer the original question, and as Ciaran hints at, there really is no convention currently defined in libOmexMeta for when a particular prefix is used or not. In general, I don't think such conventions are useful as the full URIs should always be used and the OMEX metadata spec should be focused on ensuring that the RDF graph is unambiguous and reproducible when a given archive is loaded into different tools.

For the case of pubmed and identifiers.org the interchangability of : and / is just an identifiers.org feature (?) and not something that is generally true. Looking at the registry, the : is the "correct" character (https://registry.identifiers.org/registry/pubmed) and by defining a prefix of https://identifiers.org/pubmed: we can then shortcut annotations in the code to just require the actual pubmed id and have URIs of the form pubmed:id.

For libOmexMeta it might be useful to define a convention on prefixes and then ensure that any annotations loaded in libOmexMeta are then normalised as per that convention if they were to be serialised back into a text format. For starters, that convention would largely be driven by the existing convenience methods like add_pubmed but could include any useful ontologies or vocabularies that users often make use of.

jonrkarr commented 3 years ago

My question is specifically about XML declarations of namespaces that are only used in the values of XML attributes. Because the prefixes of these namespaces are not used by any XML tag or XML attribute, most tools would not need these declarations to interpret XML documents. Is OMEX Metadata recommending these additional prefix declarations as a form of documentation?

Example of namespace used in the value of an XML attribute

xmlns:pubmed="http://identifiers.org/pubmed:"
rdf:resource="http://identifiers.org/pubmed:1234"

Counterexample of the more conventional use of XML prefix declarations

xmlns:bqmodel="http://biomodels.net/model-qualifiers/"
bqmodel:is="..."

Identifiers.org

Agreed Identifiers.org : breaks conventions. https://identifiers.org/pubmed/ could be used instead of https://identifiers.org/pubmed:. I think identifiers.org supports the terminal / in addition to : for all namespaces.