Create documentation for use of RO in bioinformatics databases, particularly Chado

cmungall commented 9 years ago

Discussion from list:

> So here's my very vague question: what do I need to know?  It seems that
> OBO_REL no longer exists, which is going to be a hassle for many
> Chado-related tools, which considered it fairly fundamental (with its is_a
> and part_of and a consistent cv.name of 'relationship').  I'm guessing BFO
> should completely replace OBO_REL, and RO is not really going to be needed
> in general instances of Chado (though obviously some users may want it, it
> isn't fundamental in the same way).

Hi Scott,

I recommend that Chado uses RO, you can get the .obo from here: http://purl.obolibrary.org/obo/ro.obo 

Note this includes some classes from BFO, it should be possible to produce a more minimal relation only edition/subset.

There are two pain points for you:

 1. Chado needs an edge label for "is_a"
 2. There is code that makes an assumption that there will be a relation with OBO-Format ID "OBO_REL:part_of" 

We can make a special edition just for Chado. For 1, we would add a special stanza with OBO-Format ID "OBO_REL:is_a". This doesn't really make any sense in terms of the mapping to OWL, where SubClass is a builtin construct. But we can ignore this, as this edition is just for Chado. 

For 2, it's up to you. You can try and coordinate the move for Chado to use BFO:0000050 (which is in RO). Or we can have the Chado edition have the old OBO_REL:part_of as well, with some documentation that the goal is to eventually move away from this.

cc @scottcain

scottcain commented 9 years ago

Hi @cmungall

I'm fine with creating a "Chado edition" of RO; the other thing we would certainly need is a 'derives_from', but it looks like that is present in RO, though I'm noticing that the names of all of the features in RO lack underscores--is that how they'd get stored in Chado as well? Or is that part of what would be in the Chado edition?

Of course, it must also be loadable in Chado, and RO currently fails to load using xsltproc/DBIx::DBStag in Chado, but that may be due to the missing is_a or something else, I haven't looked closely at what the problem is. If another method is preferable, that needs to be worked in to the build procedure.

cmungall commented 9 years ago

This is also very important for people consuming the .obo version:

https://github.com/oborel/obo-relations/wiki/Identifiers

But this needs to be made clearer. Groups using an ad-hoc obo parser may not be implementing the obo shorthand ID rule, so it may not be obvious how to connect their go.obo file with the ro.obo file. ie it's necessary to use the xref in the Typedef stanza in their main obo file to join to ro.obo

gabinkley commented 9 years ago

I (SGD) second the vote for "a special stanza with OBO-Format ID "OBO_REL:is_a". I appreciate the fact that "is_a" is not correct OWL format, but for practical database usage, it is definitely needed.

How soon would this special stanza be made available?

Thanks in advance!

spficklin commented 9 years ago

Adding to this discussion. We have an OBO parser in Tripal that imports vocabularies into Chado. It seems to get hung up on two issues.

The first is that the 'default-namespace' header is not present. In the OBO_REL this had the name 'relationship'. Chado divides the ontologies among two groups of tables (cv and db) and the namspace is used as the default name for the vocabulary. Is the 'default-namespace' header no longer needed or is it missing from the ro.obo? I see an 'ontology' term in the header but I don't see this as a proper header attribute in the OBO v1.2 design spec. Is this meant to replace the 'default-namespace'?

Related.... the second issue is the inclusion of other terms from other vocabularies (e.g. GO, PATO, etc.). There is a bit of discussion above about that. But parsing into Chado of these non RO terms is a bit of an issue because we don't know the namespace for those terms. We could find it a round-about way if the vocabulary is already in Chado (e.g. GO) but if it isn't there's we don't have the namespace for it. I see in the header section of the ro.obo file there is a remark indicating where these terms are coming from, but unfortunately it seems there is no formal definition for how these remarks should be formed (so can't guarantee a formal set of parsing rules), and the link is to an OWL document rather than an OBO... :-( We need an OWL parser for Tripal but we don't have it yet.

Can the default-namespace be added back into the header of the ro.obo, or should we be using the 'ontology' term? Also, can the namespace for the referenced ontologies be included in the remark somewhere and can there be a formal definition made to the obo standard for these types of remarks? (maybe there is and I don't know it).

Thanks.

cybersiddhu commented 8 years ago

As far as default-namespace tag is concern, there seems to be some information in the obo spec. After parsing an OBO Document, any frame without a namespace is assigned the default-namespace, from the OBO Document header. If this is not specified, the Parser assigns a namespace arbitrarily. It is recommended this is equivalent to the URL or file path from which the document was retrieved.

I use onto-perl parser that does not attempt to provide a namespace in absence of default-namespace tag(as it happens with latest RO). So my current choice is to generate that through either from file path or http url. File path seems to be a good choice as my loader currently does not accept remote URL.

However, as asked by @spficklin, whether using ontology tag is allowed instead, and what is the difference with default-namespace tag, seems to be overlapping for me. Any idea, thoughts @cmungall ?

cybersiddhu commented 8 years ago

How about we use the core ontology(core.owl) to instead of ro.owl. It seems to bring the basic relations from BFO import. Then we could just convert using owltools. So, what we miss if we do that ? It definitely bypasses the extra imports (GO/PATO etc) that are present in ro.obo.

cmungall commented 8 years ago

If the basic relations are all you need then you can do this, but I'm not sure it necessarily solves some of the issues above.

If the classes present (used to define domain/range constraints) then will can filter these out (--remove-tbox in owltools)

cybersiddhu commented 8 years ago

It seems to filter also the important ones, for example derives from. Is there any way i could choose what i could filter, like a particular import, for example go-biotic and/or pato. My issue is that loading ro will also bring those imported terms which conflicts when those ontologies are going to be loaded separately. I don't know how they co-exist in chado, if they need to be at all.

cybersiddhu commented 8 years ago

I couldn't figure out a way to filter the (using owtools mostly) imported ontologies, mostly the ones that belongs to other standalone ontologies. I tried using --remove-import-declaration option in owltools however even after filtering pato-import.owl some of the PATO terms still remains. Just to be clear here are the import declarations in owl

        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/annotations.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/core.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/el-constraints.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/go-biotic.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/go_mf_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/pato_import.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/rohom.owl"/>
        <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ro/temporal-intervals.owl"/>

and for the owl2obo conversion the top three should stay or at least that what i understood so far. However, having a very little understand of web ontology(owl stuff) i couldn't go any further. So, i tried to get it converted to json-ld(just a little easier to understand, not necessarily solves the problem directly) using apache jena riot tool. Unfortunately, the default options does not honor the import declaration. The other options is to write java code using jena library to create a custom converter. At this point i think that is getting well out of context.

So, could somebody lend me a some help here ?

cybersiddhu commented 8 years ago

That subset solved most of the issues that i have faced so far with ro. Here is a summary of my understanding ..

namespace

It is solved by following the specification here. In short, the loader do the following lookups in order before it gives up.

default-namespace tag.
parsing file name from the input file and then strip out the obo extension.
ontology tag.

cross ontology terms

This subset strips them out.

obo shorthand identifiers

As mentioned here by chris. Haven't implemented yet though.

backward incompatibility

This subset seems to be backward incompatible. The basic relations needed for chado features are under BFO namespaces. The name of the relations also lacks the underscore _ (part of instead of part_of). Both them are mentioned earlier in this issue.

Note: The subset is yet to be merged and currently resides in ro-for-chado branch.

cmungall commented 8 years ago

See the latest commit

cybersiddhu commented 8 years ago

The previous commits definitely resolves the default-namespace issue. The backward compatibility really did not effect, my loading works with or without it. The commit also shows how to modify/customize the ontology using Makefile and custom scripts, so simple, however wasn't aware about it, thanks to chris. Lastly, when this branch is going to merged, that will more or less make it easy to use directly from the master(less hacky i suppose). Just saw the #95

cmungall commented 8 years ago

ro-chado.obo is in the latest RO release https://github.com/oborel/obo-relations/releases/tag/v2016-06-15

The PURL is http://purl.obolibrary.org/obo/ro/subsets/ro-chado.obo

this will always resolve to the latest release version of this subset/transform

cmungall commented 8 years ago

Available now:

https://github.com/oborel/obo-relations/wiki/Using-RO-in-Chado

oborel / obo-relations