Closed pombase-admin closed 7 years ago
Should be easy to implement after #1023.
Looking forward to this..... It might be quite a big benefit for our users if they don't see any of the terms with taxon restrictions when they search (in drop down), and in the term children selections on "canto' pages.
Keep this in mind as sooner rather than later. I'll put "high priority " on it, but that is purely wishful thinking, so in your own time....
Taxon restrictions are now part of the "plus" that goes into the go-plus.* files. The GO web site claims it's only available in OWL, but I see an OBO file as well in GO svn, and the PURL works. It's got a slew of other extras as well, and I don't know how you'll want to deal with that.
Description: http://geneontology.org/page/download-ontology#go-plus.owl
Download from: http://purl.obolibrary.org/obo/go/extensions/go-plus.owl http://purl.obolibrary.org/obo/go/extensions/go-plus.obo
Thanks Midori. I've had a look. The restrictions look like:
relationship: only_in_taxon NCBITaxon:4751 {id="GOTAX:0000025"} ! Fungi
Hopefully we can use owltools to propagate the onto_in_taxon relation to all the terms, then load the result. I'll try that.
We may need to load go-plus.obo instead on go-basic.obo for this to work.
We will need to use all of the only_in_taxon flags, and the never_in_taxon flags that mention S. pombe or any higher taxon that includes it (e.g. we need to heed never_in_taxon NCBITaxon:2759 ! Eukaryota, but we don't care about never_in_taxon NCBITaxon:33090 ! Viridiplantae).
I guess we need a generic way of doing this based on taxon, for configuration?
Yep. Hopefully the configuration will be as simple as having a list like:
Because the restrictions are just relations in the OBO file it will be very easy to find which terms to ignore by reading the owltools output.
I have a slight worry that the go-plus.obo file will have other differences from go-basic.obo that will trip us up. We're only know when we try. I hope to try today or early next week.
I'm trying a load
I mean I'm trying to load go-plus.obo into the test Canto - not a Chado load. I don't think we need to change the Chado loading.
I'm trying a load with go-plus.obo now without any other changes.
That failed so I'll try Chobo - #1164
I have a slight worry that the go-plus.obo file will have other differences from go-basic.obo that will trip us up.
It seems fine. After #1164 the GO terms, the taxon subsets and the taxon terms are loaded. We get some extra stuff that isn't so useful like UBERON terms but we can ignore those for now.
Hopefully the configuration will be as simple as having a list like:
I'd like to have a chat about this before I go ahead to make sure I know what I'm doing.
I've tried loading go-plus.obo into the test Canto. It looks fine to me. I'm going change the ontology update script to load it into the main Canto tonight.
So if you see any problems in the next few days that might be why. It won't take long to load go-simple.obo again if things go wrong.
Need to implement #1258 first.
Need to implement #1258 first.
Turns out that #1258 wasn't needed (but was good to fix anyway).
owltools has a --make-species-subset that should handle "never_in_taxon" and "only_in_taxon" and do the right thing with no configuration. We just hand it a taxon ID and it does all the inference.
With the --make-species-subset flag, owltools should produce an OBO file with just pombe terms. I've tried the command line that Chris M suggested and it's nearly OK so I think this will work.
We need to add a command line this in the Canto ontology loader script:
owltools go-plus.obo --reasoner elk --make-species-subset -t NCBITaxon:4896 \
-o -f obo go-pombe-subset.obo
There's an owltools issue that I've reported: owlcollab/owltools#164
I've been looking at the output of owltools and I have a question:
If term A has the relationship:
relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe
and term B is part_of A, should owltools infer that term B is:
never_in_taxon NCBITaxon:4896
as well?
As a concrete example we shouldn't load "synaptonemal complex" into pombe Canto because it is never_in_taxon NCBITaxon:4896:
[Term]
id: GO:0000795
name: synaptonemal complex
namespace: cellular_component
alt_id: GO:0005716
def: "A proteinaceous scaffold found between homologous chromosomes during meiosis." [GOC:elh]
xref: Wikipedia:Synaptonemal_complex
is_a: GO:0044454 ! nuclear chromosome part
relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe
relationship: part_of GO:0000794 ! condensed nuclear chromosome
but what about:
[Term]
id: GO:0000800
name: lateral element
namespace: cellular_component
def: "A proteinaceous core found between sister chromatids during meiotic prophase." [GOC:elh]
synonym: "axial element" EXACT [GOC:ascb_2009, GOC:dph, GOC:tb]
is_a: GO:0044454 ! nuclear chromosome part
relationship: part_of GO:0000795 ! synaptonemal complex
which is part_of "synaptonemal complex"?
This is a problem with GO. Pombe does not have (canonical) synaptonemal complex, but it does have lateral elements. Lateral element should not be part_of synaptonemal complex (If anything it's the other way around?.. synaptonemal complex is part_of the 'lateral elements', or there should be some grouping term).
This raises the question as to why we aren't currently getting a taxon violation report for lateral element annotations @mah11 @cmungall any ideas ? I thought the taxon restrictions propagated downwards?
So Kim, in summary, what you are doing is correct. So you can proceed. Based on this, we would not currently see "lateral element" in Canto, but this should be resolved when the parentage is fixed in GO.
So Kim, in summary, what you are doing is correct.
I was asking about this because the owltools --make-species-subset doesn't remove the "lateral element". It nearly removes "synaptonemal complex" except for owlcollab/owltools#164.
So is it OK for "lateral element" to be loaded?
So is it OK for "lateral element" to be loaded?
I've read through again and now I understand more I think. So no need to reply to that.
I think lateral element should not be loaded (even though we have used it, correctly), its parentage in GO is incorrect. It should not be a descendant of a term which is taxon restricted for fission yeast. Make sense?
Make sense?
Yep, it does now thanks.
I'll see what the answer is to geneontology/go-ontology#12683 then follow up with an owltools issue if needed.
All of the above discussion about "synaptonemal complex" vs. "lateral element" you can ignore. Pombe has its own special term "linear element" for the "proteinaceous structure between meiotic chromosomes" because the community are clear that pombe does not have a "synaptonemal complex". If the term "synaptonemal complex" is used more broadly to describe the structure (which is how it is defined), I can't see any reason not to call it a synaptonemal complex. This will require a change of mind set which I am now working on.....
So at present we should not include it (or its descendants) because of the taxon restriction. This will restriction will most likely be lifted. Sorry for the confusion!
I'm still confused.
"lateral element" is part_of "synaptonemal complex"
"synaptonemal complex" has the taxon restriction: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe so it doesn't appear in the output of the "owltools --make-species-subset ..."
but "lateral element" does appear in the output of owltools despite being part_of "synaptonemal complex"
Is owltools doing the right thing? Should it infer that "lateral element" is never_in_taxon pombe?
hmm, my output is different:
$ owltools --use-catalog go-plus.owl --reasoner elk --make-species-subset -t NCBITaxon:4896 -o -f obo pombe.obo
....chit chat....
$ grep GO:0000800 pombe.obo || echo 'not found'
not found
hmm, my output is different:
Ah, thanks Chris. I wasn't using --use-catalog. With that flag GO:0000800 is gone.
that's kind of weird. The results should be the same, the only difference is that it will use a local copy (assuming you are in the svn dir) rather than the same file pulled over http...
(assuming you are in the svn dir)
Sorry, I should have mentioned that I initially wasn't running in in the SVN dir. With --use-catalog, owltools was giving an error about a missing catalog file. I found the file it needed in ontology/extensions so I ran owltools there with the flag and it worked.
Turns out that Chobo drops stanzas without a "name:" so we have a work around for owlcollab/owltools#164. It does generate a lot of warnings though.
I've loaded the pombe-only go-plus.obo into the test Canto and it all seems fine.
So Val: could you try a few terms in the test Canto to see it is as you'd expect? If it looks good we can update the main pombe Canto and finally close this issue.
It works a bit :)
If I search on GO:0032501 multicellular organismal process
GO:0032501 multicellular organismal process never_in_taxon 4896 Schizosaccharomyces pombe
I don't see it,
BUT I see quite a few descendants:
I checked the first three in the completion list and they aren't children of "multicellular organismal process" and I can't see any taxon restrictions on the actual parent terms.
Ah OK, I'll do a GO ticket for that..... and try some more...
OK I had another go I find GO:0046858 chlorosome HAS only_in_taxon Prokaryota
Or am I only checking never_in at the moment?
GO:0005814 centriole has never_in_taxon Fungi
I don't find this ! Cool!
requesting lots of new taxon restrictions ;) https://github.com/geneontology/go-ontology/issues/12690
GO:0046858 chlorosome HAS only_in_taxon Prokaryota Or am I only checking never_in at the moment?
I assumed that owltools was applying the never_in and only_in constraints. I thought that terms with "only_in_taxon Prokaryota" etc. would be dropped.
it should be
There are a few stanzas with only_in_taxon Prokaryota in the pombe-only go-plus.obo I generated. Most of the other stanzas with only_in_taxon (like all the Vertebrata and Viridiplantae ones) have been removed by owltools. The never_in_taxon constraints seem to be working.
Should I make an owltools issue about this?
Here's an only_in_taxon Prokaryota example:
[Term] id: GO:0009291 name: unidirectional conjugation namespace: biological_process def: "The process of unidirectional (polarized) transfer of genetic ..." [ISBN:0387520546] subset: gosubset_prok synonym: "mating" BROAD [] is_a: GO:0000746 ! conjugation is_a: GO:0009292 ! genetic transfer relationship: only_in_taxon NCBITaxon_Union:0000004 {id="GOTAX:0000117"} ! Prokaryota
Here's the only non-Prokaryota example I can see:
[Term] id: GO:0009766 name: primary charge separation namespace: biological_process def: "In the photosynthetic reaction centers, primary charge separation is initiated by the excitation of a molecule followed by the transfer of an electron to an electron acceptor molecule f ollowing energy transfer from light harvesting complexes." [ISBN:0792361431] is_a: GO:0022904 ! respiratory electron transport chain relationship: only_in_taxon NCBITaxon_Union:0000007 {id="GOTAX:0000169"} ! Viridiplantae or Bacteria or Euglenozoa relationship: part_of GO:0019684 ! photosynthesis, light reaction
And here's the full OBO file: https://curation.pombase.org/kmr44/go-plus-pombe-only.obo
I think I know but yes make a ticket thanks
On 30 Sep 2016, at 16:38, Kim Rutherford wrote:
There are a few stanzas with only_in_taxon Prokaryota in the pombe-only go-plus.obo I generated. Most of the other stanzas with only_in_taxon (like all the Vertebrata and Viridiplantae ones) have been removed by owltools. The never_in_taxon constraints seem to be working.
Should I make an owltools issue about this?
Here's an only_in_taxon Prokaryota example:
[Term] id: GO:0009291 name: unidirectional conjugation namespace: biological_process def: "The process of unidirectional (polarized) transfer of genetic ..." [ISBN:0387520546] subset: gosubset_prok synonym: "mating" BROAD [] is_a: GO:0000746 ! conjugation is_a: GO:0009292 ! genetic transfer relationship: only_in_taxon NCBITaxon_Union:0000004 {id="GOTAX:0000117"} ! Prokaryota
Here's the only non-Prokaryota example I can see:
[Term] id: GO:0009766 name: primary charge separation namespace: biological_process def: "In the photosynthetic reaction centers, primary charge separation is initiated by the excitation of a molecule followed by the transfer of an electron to an electron acceptor molecule f ollowing energy transfer from light harvesting complexes." [ISBN:0792361431] is_a: GO:0022904 ! respiratory electron transport chain relationship: only_in_taxon NCBITaxon_Union:0000007 {id="GOTAX:0000169"} ! Viridiplantae or Bacteria or Euglenozoa relationship: part_of GO:0019684 ! photosynthesis, light reaction
And here's the full OBO file: https://curation.pombase.org/kmr44/go-plus-pombe-only.obo
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/pombase/canto/issues/28#issuecomment-250875846
Kim, I was going to add a note to this ticket about the taxon restriction that confused us. It was a taxon restriction that worked, but that was not in the obo file you were using, so we could not figure out how it was working...can you remember the specific example? I think it was a "never in metazoa"
The term that confused us was GO:0033173 - "calcineurin-NFAT signaling cascade". It doesn't have a never_in_taxon or an only_in_taxon relationship but owltools --make-species-subset manages to exclude it anyway. We couldn't work out how it's configured.
Oh I think we know what that is. GO:0033173 - "calcineurin-NFAT signaling cascade" has a taxon restriction in GO we have annotations to regulates descendants. These DO not have taxon restrictions int eh OBO file because regulates is not being followed for taxon restrictions
https://github.com/geneontology/go-ontology/issues/12701
However, because you are treating all is_a, part_of and regulates terms (I guess) you are correctly assuming that all descendants are taxon violations, even though these do not have a taxon restriction in the odbo file.
Would that do it?
so, I still see: "primary charge separation" only in prokaryotes this ticket is "waiting for external change" right?
isn't this a duplicate of https://github.com/pombase/canto/issues/1340 ? closing, reopen if not...
We should be able to have a way of configuring which parts of the ontologies are presented in the curation tool. Examples might be excluding prokaryotic specific terms when annotating pombe.
We also would like to be able to restrict the tool to sub-sets of an ontology. For example that we want to be able to curate genes with protein features from the sequence ontology (and ignore the rest of SO).
See: https://sourceforge.net/tracker/?func=detail&atid=2525791&aid=3465391&group_id=65526
We are going to request the required manipulations from Owl tools
Original comment by: ValWood