Closed cmungall closed 8 years ago
Hi @cmungall
I'm fine with creating a "Chado edition" of RO; the other thing we would certainly need is a 'derives_from', but it looks like that is present in RO, though I'm noticing that the names of all of the features in RO lack underscores--is that how they'd get stored in Chado as well? Or is that part of what would be in the Chado edition?
Of course, it must also be loadable in Chado, and RO currently fails to load using xsltproc/DBIx::DBStag in Chado, but that may be due to the missing is_a or something else, I haven't looked closely at what the problem is. If another method is preferable, that needs to be worked in to the build procedure.
This is also very important for people consuming the .obo version:
https://github.com/oborel/obo-relations/wiki/Identifiers
But this needs to be made clearer. Groups using an ad-hoc obo parser may not be implementing the obo shorthand ID rule, so it may not be obvious how to connect their go.obo file with the ro.obo file. ie it's necessary to use the xref in the Typedef stanza in their main obo file to join to ro.obo
I (SGD) second the vote for "a special stanza with OBO-Format ID "OBO_REL:is_a". I appreciate the fact that "is_a" is not correct OWL format, but for practical database usage, it is definitely needed.
How soon would this special stanza be made available?
Thanks in advance!
Adding to this discussion. We have an OBO parser in Tripal that imports vocabularies into Chado. It seems to get hung up on two issues.
The first is that the 'default-namespace' header is not present. In the OBO_REL this had the name 'relationship'. Chado divides the ontologies among two groups of tables (cv and db) and the namspace is used as the default name for the vocabulary. Is the 'default-namespace' header no longer needed or is it missing from the ro.obo? I see an 'ontology' term in the header but I don't see this as a proper header attribute in the OBO v1.2 design spec. Is this meant to replace the 'default-namespace'?
Related.... the second issue is the inclusion of other terms from other vocabularies (e.g. GO, PATO, etc.). There is a bit of discussion above about that. But parsing into Chado of these non RO terms is a bit of an issue because we don't know the namespace for those terms. We could find it a round-about way if the vocabulary is already in Chado (e.g. GO) but if it isn't there's we don't have the namespace for it. I see in the header section of the ro.obo file there is a remark indicating where these terms are coming from, but unfortunately it seems there is no formal definition for how these remarks should be formed (so can't guarantee a formal set of parsing rules), and the link is to an OWL document rather than an OBO... :-( We need an OWL parser for Tripal but we don't have it yet.
Can the default-namespace be added back into the header of the ro.obo, or should we be using the 'ontology' term? Also, can the namespace for the referenced ontologies be included in the remark somewhere and can there be a formal definition made to the obo standard for these types of remarks? (maybe there is and I don't know it).
Thanks.
As far as default-namespace
tag is concern, there seems to be some information in the obo spec.
After parsing an OBO Document, any frame without a namespace is assigned the default-namespace, from the OBO Document header. If this is not specified, the Parser assigns a namespace arbitrarily. It is recommended this is equivalent to the URL or file path from which the document was retrieved.
I use onto-perl parser that does not attempt to provide a namespace in absence of default-namespace
tag(as it happens with latest RO). So my current choice is to generate that through either from file path or http url. File path seems to be a good choice as my loader currently does not accept remote URL.
However, as asked by @spficklin, whether using ontology
tag is allowed instead, and what is the difference with default-namespace
tag, seems to be overlapping for me. Any idea, thoughts @cmungall ?
How about we use the core ontology(core.owl) to instead of ro.owl. It seems to bring the basic relations from BFO
import. Then we could just convert using owltools
.
So, what we miss if we do that ?
It definitely bypasses the extra imports (GO/PATO etc) that are present in ro.obo
.
If the basic relations are all you need then you can do this, but I'm not sure it necessarily solves some of the issues above.
If the classes present (used to define domain/range constraints) then will can filter these out (--remove-tbox
in owltools)
It seems to filter also the important ones, for example derives from
. Is there any way i could choose what i could filter, like a particular import, for example go-biotic
and/or pato
.
My issue is that loading ro
will also bring those imported terms which conflicts when those ontologies are going to be loaded separately. I don't know how they co-exist in chado, if they need to be at all.
I couldn't figure out a way to filter the (using owtools mostly) imported ontologies, mostly the ones that belongs to other standalone ontologies. I tried using --remove-import-declaration
option in owltools
however even after filtering pato-import.owl
some of the PATO terms still remains. Just to be clear here are the import declarations in owl
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/annotations.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/core.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/el-constraints.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/go-biotic.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/go_mf_import.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/pato_import.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/rohom.owl"/>
<owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/temporal-intervals.owl"/>
and for the owl2obo
conversion the top three should stay or at least that what i understood so far.
However, having a very little understand of web ontology(owl stuff) i couldn't go any further.
So, i tried to get it converted to json-ld(just a little easier to understand, not necessarily solves the problem directly) using apache jena riot tool. Unfortunately, the default options does not honor the import declaration. The other options is to
write java code using jena
library to create a custom converter. At this point i think that is getting well out of context.
So, could somebody lend me a some help here ?
That subset solved most of the issues that i have faced so far with ro
. Here is a summary of my understanding ..
It is solved by following the specification here. In short, the loader do the following lookups in order before it gives up.
default-namespace
tag.ontology
tag.This subset strips them out.
As mentioned here by chris. Haven't implemented yet though.
This subset seems to be backward incompatible. The basic relations needed for chado features are under BFO
namespaces. The name of the relations also lacks the underscore _
(part of instead of part_of). Both them are mentioned earlier in this issue.
Note: The subset is yet to be merged and currently resides in ro-for-chado branch.
See the latest commit
The previous commits definitely resolves the default-namespace
issue. The backward compatibility really did not effect, my loading works with or without it. The commit also shows how to modify/customize the ontology using Makefile
and custom scripts, so simple, however wasn't aware about it, thanks to chris.
Lastly, when this branch is going to merged, that will more or less make it easy to use directly from the master(less hacky i suppose).
Just saw the #95
ro-chado.obo is in the latest RO release https://github.com/oborel/obo-relations/releases/tag/v2016-06-15
The PURL is http://purl.obolibrary.org/obo/ro/subsets/ro-chado.obo
this will always resolve to the latest release version of this subset/transform
Discussion from list:
cc @scottcain