linikujp / owltools

Automatically exported from code.google.com/p/owltools
0 stars 0 forks source link

Easy way to retrieve gci_relations? #102

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, is there any simple way in owltools to retrieve gci_relations?

I think about doing this: iterating OWLSubClassOfAxioms, using the isGCI() 
method; when it is a CGI, check whether the subclass is an 
OWLObjectIntersectionOf, with a taxon as one of the operands; retrieve the 
other operand to be considered as subclass; expand the superclass using 
OWLGraphWrapperEdges.getOutgoingEdges(OWLObject); the taxon might be stored in 
an new attribute in OWLGraphEdge.

Would it be the proper way to do it? Isn't there some similar code already 
written?

Original issue reported on code.google.com by frederic...@gmail.com on 21 Aug 2014 at 1:48

GoogleCodeExporter commented 9 years ago
There isn't a way already written in the OWLGraphWrapper. This way sounds 
feasible. I believe the isGCI() step is redundant - if your test for the LHS of 
the SubClassOf axiom yields an OWLObjectIntersectionOf (or any other anonymous 
class expression) then you know it's a GCI. But no matter.

At the start of the query, do you know the taxon of interest? Or are you 
interested in all paths from a node for all species? In the latter case, there 
are cases where it may get tricky - e.g. if the path involves traversal through 
incompatible species. If you know the species at the start it's easier.

There are other approaches involving the reasoner. See InferredParentRenderer 
and the corresponding --export-parents command. The strategy here is: for any 
taxon of interest T, make the assumption that everything in the world belongs 
to T:

  Thing SubClassOf part_of some T

This will of course make some classes unsatisfiable (the same strategy can be 
used for making taxon subsets), ignore these. Then for every property of 
interest P, and every class C make a class

  CP = P some C

Then perform subclass checks as normal using the reasoner. If you have

  C1P SubClassOf C2P

Then you know (in graph terms) there is an edge of type P between C1 and C2 in 
T.

I have recently noticed a problem with this approach in that Elk does not deal 
well with large swathes of the ontology being inconsistent (this is the reason 
why we don't make the taxon stage reports below fish)

a variant is to avoid the problematic everything-is-a-fish assertion and to 
make classes

 CPT = (P some C) and (part_of some T)

(but there may be the same issue as there may be large numbers of CPTs that are 
unsatisfiable, e.g. parts of fingers in fish)

TL;DR - the reasoner-based approach is in some ways more mathematically elegant 
but it may be more straightforward to go with your approach.

Original comment by cmung...@gmail.com on 22 Aug 2014 at 3:02

GoogleCodeExporter commented 9 years ago
Thanks for the explanations. Could you give me an example of a path that 
"involves traversal through incompatible species"? 
Because I am not sure to see the problem (if I display a graph for all 
vertebrates, so of course some relations will be different between fishes and 
mammals, but if the edge is explicitly displayed, I don't really see the 
problem)

In any case, the "one species at a time" approach would be much easier to write 
for me, and my main aim is to propagate gene expression data, so for each gene 
I can retrieve the relations for the corresponding species. 

I will try to implement something in OWLGraphWrapperEdgesExtended.

Original comment by frederic...@gmail.com on 22 Aug 2014 at 5:01

GoogleCodeExporter commented 9 years ago
The multi-species path concern may be theoretical. If the edges of the path are 
annotated for the user there is no problem, so long as there is no implied 
biological association between the end nodes.

Original comment by cmung...@gmail.com on 22 Aug 2014 at 8:59

GoogleCodeExporter commented 9 years ago
Just wondering: if "Thing SubClassOf part_of some T", then we can infer, for 
every class C, "C SubClassOf part_of some T", then I would expect a reasoner, 
when calling "getSuperClasses", to be able to identify GCI axions with 
"IntersectionOf(C, part_of some T)" as subclass (or any parent taxon), and to 
return the super class.

Why isn't it that simple?

Original comment by frederic...@gmail.com on 29 Aug 2014 at 11:29

GoogleCodeExporter commented 9 years ago
This is exactly the strategy that's used for producing your stage-by-taxon 
reports with InferredParentRenderer. There are a few issues

1. It can render large swathes of the ontology unsatisfiable, especially when T 
diverges from Uberon's areas of strengths. E.g. setting T=trichoplax will make 
everything bar epithelium and a handful of other classes unsat. Not a problem 
in theory, perfectly logical, but in practice it's an issue as Elk seems to 
consume massive amounts of memory. This is why we only produce your reports for 
vertebrates
2. The reasoner API does not allow you to fetch anonymous superclasses - e.g. 
partOf some X

For 1, a possible solution is to first remove classes using the old school 
graph-traversal based taxon constraint method. OR - simply remove constraining 
axioms, such as the taxon disjoints

For 2, we have a solution in place in owltools, the expression-materializing 
reasoner.

So it can be done, but it feels like a lot of machinery. There is this tension 
between doing things the pure DL way, which often involves a lot of 
pre-processing and boilerplate, vs doing things from a more graph-traversal 
oriented approach (ie extensions to the OWLGraphWrapper). 

Original comment by cmung...@gmail.com on 29 Aug 2014 at 11:43

GoogleCodeExporter commented 9 years ago
OK, thanks again.

Original comment by frederic...@gmail.com on 30 Aug 2014 at 1:31

GoogleCodeExporter commented 9 years ago
Ok, so, I added a basic support for OBO GCI relations: 

Please check r2335 for modifications in OWLGraphEdge and OWLGraphWrapperEdges. 
It shouldn't change any existing behavior.

r2336 implements the retrieval of GCI relations in 
OWLGraphWrapperEdgesExtended. It would be nice if you could have a look, but 
this is brand new code used only by me I guess :p
(notably, I assume that OBO GCI relations are always between OWLClasses)

r2337 modifies OWLGraphManipulator for relation reduction etc, I guess it 
doesn't really matter.

Original comment by frederic...@gmail.com on 2 Sep 2014 at 3:35

GoogleCodeExporter commented 9 years ago
Heiko can check this when he gets back next week.

I'm off next week.

Yes, obo syntax allows only a limited form of GCIs, the LHS of the expression 
must a class expression of the form "R some C", where C is a (non-anonymous) 
OWLClass.

Original comment by cmung...@gmail.com on 5 Sep 2014 at 9:19

GoogleCodeExporter commented 9 years ago
I have used these new methods quite a lot, I think they work as expected. 
Please check when you want.

Original comment by frederic...@gmail.com on 15 Oct 2014 at 6:28