Open bschilder opened 7 months ago
Using another distribution of UPHENO gives a different set of errors.
command (setting remove_cyclic_paths = TRUE, remove_rings = TRUE
doesn't make a difference):
dag <- simona::import_ontology("https://github.com/obophenotype/upheno/raw/master/upheno.owl", remove_cyclic_paths = TRUE, remove_rings = TRUE)
output:
Downloading https://github.com/obophenotype/upheno/raw/master/upheno.owl...
Converting file8e3674551214_upheno.owl to the obo format.
'/usr/bin/java' -jar '/private/var/folders/rd/rbc_wrdj4k3djf3brk6z0_dc0000gp/T/RtmpIPZrCZ/robot_temp_8e3627754717.jar' convert --input '/private/var/folders/rd/rbc_wrdj4k3djf3brk6z0_dc0000gp/T/RtmpIPZrCZ/file8e3674551214_upheno.owl' --format obo --output '/var/folders/rd/rbc_wrdj4k3djf3brk6z0_dc0000gp/T//RtmpIPZrCZ/file8e367d161ab6.obo.gz' --check false
Error in import_obo(output, verbose = verbose, ...) :
Cannot find any [Term].
I should mention, in previous versions of simona
(sometime before the changes implemented to address #6 ) I was successfully able to import this same ontology.
Hi @jokergoo this still seems to be an issue. Would really appreciate your help in fixing this.
Thanks, Brian
For this error:
Error: Too many cyclic paths (> 1000).
I let the function exit when the number of cyclic paths > 1000 because I would assume the DAG structure is wrong. What do you think about how to deal with such scenarios?
And https://github.com/obophenotype/upheno/raw/master/upheno.owl is empty:
I think I used an inefficient way to count cyclic paths. The method I used in the package is if there is a subset of cyclic paths, I count all possible combinations of paths that are cyclic, which may generate a big value, for example if 5 terms are completely connected.
And it seems there are quite a lot of duplications in the .obo
converted from upheno.owl (how import_ontology()
does internally).
If you use the owl parser I wrote in the package import_owl()
, it can be successfully imported:
> dag = import_owl("~/upheno.owl")
Parsing 421 <owl:ObjectProperty> ...
remove 9 obsolete terms
Parsing 176174 <owl:Class> ...
Parsing 147847 <rdf:Description> ...
remove 2411 obsolete terms
There are more than one root:
APO:0000001, APO:0000006, APO:0000018, BFO:0000001, BTO:0000000,
and other 60 terms ...
A super root (~~all~~) is added.
> dag
An ontology_DAG object:
Source: http://ontology.com/someuri.owl,
173643 terms / 328749 relations
Root: ~~all~~
Terms: APO:0000001, APO:0000002, APO:0000003, APO:0000004, ...
Max depth: 41
Avg number of parents: 1.89
Avg number of children: 1.95
Aspect ratio: 730.79:1 (based on the longest distance from root)
2354:1 (based on the shortest distance from root)
Relations: is_a
With the following columns in the metadata data frame:
id, short_id, name, namespace, definition
I will check import_ontology()
why there are so many duplications because import_ontology()
is the suggested way to import general ontologies in practice.
Thanks for looking into this @jokergoo
Maybe @nicolevasilevsky and @matentzn would have some insights as to whether this is expected.
I should also mention that there's the OBO Library URL for this some ontology, but seems to have the same issue.
dag <- simona::import_ontology( "https://purl.obolibrary.org/obo/upheno/v2/upheno.owl", remove_cyclic_paths = TRUE, remove_rings = TRUE)
But as you say, using the import_owl
function works:
o=simona::import_owl("https://purl.obolibrary.org/obo/upheno/v2/upheno.owl")
I thought simona::import_ontology
simply called the relevant function depending on the input file type, but it seems there's some other differences as well.
Now I have updated the package and you can update from GitHub.
The previous way for detecting cyclic paths is: for each node, I go to all its downstream terms and check whether there are cyclic paths (because I assume cyclic paths are rare in a "correctly formatted ontology", which means if A and B are cyclic, i.e. A<->B, and if they are both downstream of C and D:
C->...->A<->B
D->...->A<->B
the cyclic path of A<->B will be reported multiples times for all C/D/C's offsprings/D's offsprings.
Now I have improved the code that A<->B will be only reported once.
In the following code, upheno.obo
is converted from upheno.owl
by robot.jar (this is what exactly import_ontology()
internally does). I also import with the native import_owl()
. You can see the two dags, although are not identical, but almost.
# the same as import_ontology("~/upheno.owl")
> dag1 = import_obo("~/upheno.obo", remove_cyclic_paths = TRUE)
Parsing [Typedef] sections in the obo file [443/443]
remove 10 obsolete terms
Parsing [Term] sections in the obo file [175632/175632]
remove 1822 obsolete terms
There are more than one root:
APO:0000001, APO:0000006, APO:0000018, BFO:0000001, BTO:0000000,
and other 47 terms ...
A super root (~~all~~) is added.
Removed 13 cyclic paths.
> dag2 = import_owl("~/upheno.owl")
Parsing 421 <owl:ObjectProperty> ...
remove 9 obsolete terms
Parsing 176174 <owl:Class> ...
Parsing 147847 <rdf:Description> ...
remove 2411 obsolete terms
There are more than one root:
APO:0000001, APO:0000006, APO:0000018, BFO:0000001, BTO:0000000,
and other 60 terms ...
A super root (~~all~~) is added.
> dag1
An ontology_DAG object:
Source: http://ontology.com/someuri.owl,
173674 terms / 328845 relations
Root: ~~all~~
Terms: 397443, APO:0000001, APO:0000002, APO:0000003, ...
Max depth: 41
Avg number of parents: 1.89
Avg number of children: 1.95
Aspect ratio: 732.83:1 (based on the longest distance from root)
2357.19:1 (based on the shortest distance from root)
Relations: is_a
With the following columns in the metadata data frame:
id, short_id, name, namespace, definition
> dag2
An ontology_DAG object:
Source: http://ontology.com/someuri.owl,
173643 terms / 328749 relations
Root: ~~all~~
Terms: APO:0000001, APO:0000002, APO:0000003, APO:0000004, ...
Max depth: 41
Avg number of parents: 1.89
Avg number of children: 1.95
Aspect ratio: 730.79:1 (based on the longest distance from root)
2354:1 (based on the shortest distance from root)
Relations: is_a
With the following columns in the metadata data frame:
id, short_id, name, namespace, definition
I thought simona::import_ontology simply called the relevant function depending on the input file type, but it seems there's some other differences as well.
This is the ultimate goal. Now the importing functions are not perfect. I will make import_ontology()
an "entrance function" after I improve import_obo()
/import_owl()
/import_ttl()
.
Hey both! I know some sources of cycles, but it would be best if you could share a handful concrete ones (3-10). This way I can tell you concretely why they exist and how to avoid them!
Here they are
[CL:0000164 ~ CL:0000163 ~ CL:0000164]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00007694 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00007696 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00007696 ~ FBbt:00007695 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00007696 ~ FBbt:00007697 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00007698 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00007698 ~ FBbt:00110127 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00047736 ~ FBbt:00005106]
[FBbt:00048472 ~ FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00048472]
[FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00052132 ~ FBbt:00005106]
[FBbt:00005106 ~ FBbt:00047097 ~ FBbt:00005106]
[CL:0000556 ~ CL:4033018 ~ CL:0000556]
What I do for removing cycles is to remove the last link.
These are very surprising cycles. What relationships are you using? Just isa?
Only isa
. Take [FBbt:00005106 ~ FBbt:00007693 ~ FBbt:00005106]
for example:
(the obo file is converted from upheno.owl
by robot.jar)
[Term]
id: FBbt:00005106
name: neuron
namespace: fly_anatomy.ontology
def: "Electrically active cell of the nervous system which has an axon and/or dendrite and which is synapsed with other cells and/or has other cells synapsed to it." [https://orcid.org/0000-0002-7073-9172]
def: "Electrically active cell of the nervous system which has an axon and/or dendrite and which is synapsed with other cells and/or has other cells synapsed to it." [FBC:DOS]
subset: cur
subset: EmbDevSlim
subset: FB_gloss
subset: larval_OF
subset: scrnaseq_slim
synonym: "nerve cell" RELATED []
xref: CL:0000540
xref: VFB:FBbt_00005106
is_a: CL:0000540 ! neuron
is_a: FBbt:00007693 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00005124"} ! sensory system neuron
is_a: FBbt:00007694 {gci_filler="FBbt:00005918", gci_relation="receives_synaptic_input_from_neuron"} ! thermosensory system neuron
is_a: FBbt:00007695 {gci_filler="FBbt:00005927", gci_relation="receives_synaptic_input_from_neuron"} ! gustatory system neuron
is_a: FBbt:00007696 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00005922"} ! chemosensory system neuron
is_a: FBbt:00007697 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00005926"} ! olfactory system neuron
is_a: FBbt:00007698 {gci_filler="FBbt:00005919", gci_relation="receives_synaptic_input_from_neuron"} ! mechanosensory system neuron
is_a: FBbt:00047097 {gci_relation="develops_from", gci_filler="FBbt:00047097"} ! primary neuron
is_a: FBbt:00047736 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00004211"} ! visual system neuron
is_a: FBbt:00048472 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00048471"} ! nociceptive system neuron
is_a: FBbt:00052132 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00051439"} ! proprioceptive system neuron
is_a: FBbt:00100318 ! somatic cell
is_a: FBbt:00110127 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00007684"} ! auditory system neuron
intersection_of: CL:0000540 ! neuron
intersection_of: part_of NCBITaxon:7227 ! Drosophila melanogaster
relationship: part_of FBbt:00005093 ! nervous system
property_value: IAO:0000589 "neuron (drosophila)" xsd:string
and
[Term]
id: FBbt:00007693
name: sensory system neuron
namespace: fly_anatomy.ontology
def: "Any neuron (FBbt:00005106) that capable of part of some sensory perception (GO:0007600)." [FBC:MMC]
xref: VFB:FBbt_00007693
is_a: FBbt:00005106 ! neuron
intersection_of: FBbt:00005106 ! neuron
intersection_of: capable_of_part_of GO:0007600 ! sensory perception
relationship: part_of FBbt:00007692 ! sensory system
😱 oooooooooo.. Nooooo! 😱
is_a: FBbt:00007693 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00005124"} ! sensory system neuron
This is not an isa relation. Scaaary, it makes total sense for someone without deep knowledge of OBO that is building a parser to stumble across this. This is a GCI! It means something like (sorry I will definitely say this wrong, but you get the picture):
"every neuron that receives synaptic input from a sensory neuron
is a sensory system neuron
"
I am sure @dosumis will tell me off for relating this wrongly, but the point is: this is not a regular isa relation!
EDIT:
Solution:
If the obo is contains any key starting with the word "gci", its probably best to ignore the isa relation
@matentzn Thanks for this information! I am not in this field and I mainly work with Gene Ontology.
Do you know or have any resource introducing such "gci"- or related things (link, book, articles)? Then I can integrate and improve this tool.
Package updated.
id: FBbt:00005106 name: neuron is_a: FBbt:00007693 {gci_relation="receives_synaptic_input_from_neuron", gci_filler="FBbt:00005124"} ! sensory system neuron
"every neuron that receives synaptic input from a sensory neuron is a sensory system neuron"
Ha, now I understand it. Thanks!
Do you know or have any resource introducing such "gci"- or related things (link, book, articles)? Then I can integrate and improve this tool.
The best place I can point you to is the spec (https://owlcollab.github.io/oboformat/doc/obo-syntax.html#5), in particular section 5.2.2.
My general sense is that it would be better to either invest in the RDFXML reader, or imo even better, in a obographs-json reader. The obographs-json format is more easy to interpret for downstream users than the OBO format. However, I guess practically both are needed...
@matentzn An introduction to GCIs might be a good OBO Academy topic.
I'd prefer it if OBO conversion didn't try to represent these. It can still be lossless by storing untranslated axioms in the header. Don't know if that's an option in our pipelines though
I also second use of OBO graphs JSON over custom OBO parsers - if that's an option here
When trying to import the unified phenotype ontology (UPHENO), I run into some errors about
Too many cyclic paths (> 1000).
Is there something different about this ontology? If so, is there a way it can still be imported bysimona
?Thanks in advance!
command:
output:
Running with the
remove_cyclic_paths
and/orremove_rings
args doesn't seem to help (same error).Session info