pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

subsets: taxon restrictions and restricting subsets of ontologies from visibility in the curation tool #28

Closed pombase-admin closed 7 years ago

pombase-admin commented 11 years ago

We should be able to have a way of configuring which parts of the ontologies are presented in the curation tool. Examples might be excluding prokaryotic specific terms when annotating pombe.

We also would like to be able to restrict the tool to sub-sets of an ontology. For example that we want to be able to curate genes with protein features from the sequence ontology (and ignore the rest of SO).

See: https://sourceforge.net/tracker/?func=detail&atid=2525791&aid=3465391&group_id=65526

We are going to request the required manipulations from Owl tools

Original comment by: ValWood

kimrutherford commented 8 years ago

Should be easy to implement after #1023.

ValWood commented 8 years ago

Looking forward to this..... It might be quite a big benefit for our users if they don't see any of the terms with taxon restrictions when they search (in drop down), and in the term children selections on "canto' pages.

Keep this in mind as sooner rather than later. I'll put "high priority " on it, but that is purely wishful thinking, so in your own time....

mah11 commented 8 years ago

Taxon restrictions are now part of the "plus" that goes into the go-plus.* files. The GO web site claims it's only available in OWL, but I see an OBO file as well in GO svn, and the PURL works. It's got a slew of other extras as well, and I don't know how you'll want to deal with that.

Description: http://geneontology.org/page/download-ontology#go-plus.owl

Download from: http://purl.obolibrary.org/obo/go/extensions/go-plus.owl http://purl.obolibrary.org/obo/go/extensions/go-plus.obo

kimrutherford commented 8 years ago

Thanks Midori. I've had a look. The restrictions look like:

relationship: only_in_taxon NCBITaxon:4751 {id="GOTAX:0000025"} ! Fungi

Hopefully we can use owltools to propagate the onto_in_taxon relation to all the terms, then load the result. I'll try that.

We may need to load go-plus.obo instead on go-basic.obo for this to work.

mah11 commented 8 years ago

We will need to use all of the only_in_taxon flags, and the never_in_taxon flags that mention S. pombe or any higher taxon that includes it (e.g. we need to heed never_in_taxon NCBITaxon:2759 ! Eukaryota, but we don't care about never_in_taxon NCBITaxon:33090 ! Viridiplantae).

kimrutherford commented 8 years ago

I guess we need a generic way of doing this based on taxon, for configuration?

Yep. Hopefully the configuration will be as simple as having a list like:

Because the restrictions are just relations in the OBO file it will be very easy to find which terms to ignore by reading the owltools output.

I have a slight worry that the go-plus.obo file will have other differences from go-basic.obo that will trip us up. We're only know when we try. I hope to try today or early next week.

kimrutherford commented 8 years ago

I'm trying a load

I mean I'm trying to load go-plus.obo into the test Canto - not a Chado load. I don't think we need to change the Chado loading.

kimrutherford commented 8 years ago

I'm trying a load with go-plus.obo now without any other changes.

That failed so I'll try Chobo - #1164

kimrutherford commented 8 years ago

I have a slight worry that the go-plus.obo file will have other differences from go-basic.obo that will trip us up.

It seems fine. After #1164 the GO terms, the taxon subsets and the taxon terms are loaded. We get some extra stuff that isn't so useful like UBERON terms but we can ignore those for now.

Hopefully the configuration will be as simple as having a list like:

I'd like to have a chat about this before I go ahead to make sure I know what I'm doing.

kimrutherford commented 8 years ago

I've tried loading go-plus.obo into the test Canto. It looks fine to me. I'm going change the ontology update script to load it into the main Canto tonight.

So if you see any problems in the next few days that might be why. It won't take long to load go-simple.obo again if things go wrong.

kimrutherford commented 8 years ago

Need to implement #1258 first.

kimrutherford commented 8 years ago

Need to implement #1258 first.

Turns out that #1258 wasn't needed (but was good to fix anyway).

owltools has a --make-species-subset that should handle "never_in_taxon" and "only_in_taxon" and do the right thing with no configuration. We just hand it a taxon ID and it does all the inference.

With the --make-species-subset flag, owltools should produce an OBO file with just pombe terms. I've tried the command line that Chris M suggested and it's nearly OK so I think this will work.

We need to add a command line this in the Canto ontology loader script:

owltools go-plus.obo --reasoner elk --make-species-subset -t NCBITaxon:4896 \
     -o -f obo go-pombe-subset.obo
kimrutherford commented 8 years ago

There's an owltools issue that I've reported: owlcollab/owltools#164

kimrutherford commented 8 years ago

I've been looking at the output of owltools and I have a question:

If term A has the relationship:

relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe

and term B is part_of A, should owltools infer that term B is:

never_in_taxon NCBITaxon:4896

as well?

As a concrete example we shouldn't load "synaptonemal complex" into pombe Canto because it is never_in_taxon NCBITaxon:4896:

[Term]
id: GO:0000795
name: synaptonemal complex
namespace: cellular_component
alt_id: GO:0005716
def: "A proteinaceous scaffold found between homologous chromosomes during meiosis." [GOC:elh]
xref: Wikipedia:Synaptonemal_complex
is_a: GO:0044454 ! nuclear chromosome part
relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe
relationship: part_of GO:0000794 ! condensed nuclear chromosome

but what about:

[Term]
id: GO:0000800
name: lateral element
namespace: cellular_component
def: "A proteinaceous core found between sister chromatids during meiotic prophase." [GOC:elh]
synonym: "axial element" EXACT [GOC:ascb_2009, GOC:dph, GOC:tb]
is_a: GO:0044454 ! nuclear chromosome part
relationship: part_of GO:0000795 ! synaptonemal complex

which is part_of "synaptonemal complex"?

ValWood commented 8 years ago

This is a problem with GO. Pombe does not have (canonical) synaptonemal complex, but it does have lateral elements. Lateral element should not be part_of synaptonemal complex (If anything it's the other way around?.. synaptonemal complex is part_of the 'lateral elements', or there should be some grouping term).

This raises the question as to why we aren't currently getting a taxon violation report for lateral element annotations @mah11 @cmungall any ideas ? I thought the taxon restrictions propagated downwards?

ValWood commented 8 years ago

https://github.com/geneontology/go-ontology/issues/12684 https://github.com/geneontology/go-ontology/issues/12683

ValWood commented 8 years ago

So Kim, in summary, what you are doing is correct. So you can proceed. Based on this, we would not currently see "lateral element" in Canto, but this should be resolved when the parentage is fixed in GO.

kimrutherford commented 8 years ago

So Kim, in summary, what you are doing is correct.

I was asking about this because the owltools --make-species-subset doesn't remove the "lateral element". It nearly removes "synaptonemal complex" except for owlcollab/owltools#164.

So is it OK for "lateral element" to be loaded?

kimrutherford commented 8 years ago

So is it OK for "lateral element" to be loaded?

I've read through again and now I understand more I think. So no need to reply to that.

ValWood commented 8 years ago

I think lateral element should not be loaded (even though we have used it, correctly), its parentage in GO is incorrect. It should not be a descendant of a term which is taxon restricted for fission yeast. Make sense?

kimrutherford commented 8 years ago

Make sense?

Yep, it does now thanks.

I'll see what the answer is to geneontology/go-ontology#12683 then follow up with an owltools issue if needed.

ValWood commented 8 years ago

All of the above discussion about "synaptonemal complex" vs. "lateral element" you can ignore. Pombe has its own special term "linear element" for the "proteinaceous structure between meiotic chromosomes" because the community are clear that pombe does not have a "synaptonemal complex". If the term "synaptonemal complex" is used more broadly to describe the structure (which is how it is defined), I can't see any reason not to call it a synaptonemal complex. This will require a change of mind set which I am now working on.....

So at present we should not include it (or its descendants) because of the taxon restriction. This will restriction will most likely be lifted. Sorry for the confusion!

kimrutherford commented 8 years ago

I'm still confused.

"lateral element" is part_of "synaptonemal complex"

"synaptonemal complex" has the taxon restriction: never_in_taxon NCBITaxon:4896 {id="GOTAX:0010058"} ! Schizosaccharomyces pombe so it doesn't appear in the output of the "owltools --make-species-subset ..."

but "lateral element" does appear in the output of owltools despite being part_of "synaptonemal complex"

Is owltools doing the right thing? Should it infer that "lateral element" is never_in_taxon pombe?

cmungall commented 8 years ago

hmm, my output is different:

$ owltools --use-catalog go-plus.owl --reasoner elk --make-species-subset -t NCBITaxon:4896 -o -f obo pombe.obo
....chit chat....
$ grep GO:0000800 pombe.obo || echo 'not found'
not found
kimrutherford commented 8 years ago

hmm, my output is different:

Ah, thanks Chris. I wasn't using --use-catalog. With that flag GO:0000800 is gone.

cmungall commented 8 years ago

that's kind of weird. The results should be the same, the only difference is that it will use a local copy (assuming you are in the svn dir) rather than the same file pulled over http...

kimrutherford commented 8 years ago

(assuming you are in the svn dir)

Sorry, I should have mentioned that I initially wasn't running in in the SVN dir. With --use-catalog, owltools was giving an error about a missing catalog file. I found the file it needed in ontology/extensions so I ran owltools there with the flag and it worked.

kimrutherford commented 8 years ago

Turns out that Chobo drops stanzas without a "name:" so we have a work around for owlcollab/owltools#164. It does generate a lot of warnings though.

I've loaded the pombe-only go-plus.obo into the test Canto and it all seems fine.

So Val: could you try a few terms in the test Canto to see it is as you'd expect? If it looks good we can update the main pombe Canto and finally close this issue.

ValWood commented 8 years ago

It works a bit :)

If I search on GO:0032501 multicellular organismal process

GO:0032501 multicellular organismal process never_in_taxon 4896 Schizosaccharomyces pombe

I don't see it,

BUT I see quite a few descendants:

test taxon

kimrutherford commented 8 years ago

I checked the first three in the completion list and they aren't children of "multicellular organismal process" and I can't see any taxon restrictions on the actual parent terms.

ValWood commented 8 years ago

Ah OK, I'll do a GO ticket for that..... and try some more...

ValWood commented 8 years ago

OK I had another go I find GO:0046858 chlorosome HAS only_in_taxon Prokaryota

Or am I only checking never_in at the moment?

ValWood commented 8 years ago

GO:0005814 centriole has never_in_taxon Fungi

I don't find this ! Cool!

ValWood commented 8 years ago

requesting lots of new taxon restrictions ;) https://github.com/geneontology/go-ontology/issues/12690

kimrutherford commented 8 years ago

GO:0046858 chlorosome HAS only_in_taxon Prokaryota Or am I only checking never_in at the moment?

I assumed that owltools was applying the never_in and only_in constraints. I thought that terms with "only_in_taxon Prokaryota" etc. would be dropped.

cmungall commented 8 years ago

it should be

kimrutherford commented 8 years ago

There are a few stanzas with only_in_taxon Prokaryota in the pombe-only go-plus.obo I generated. Most of the other stanzas with only_in_taxon (like all the Vertebrata and Viridiplantae ones) have been removed by owltools. The never_in_taxon constraints seem to be working.

Should I make an owltools issue about this?

Here's an only_in_taxon Prokaryota example:

[Term] id: GO:0009291 name: unidirectional conjugation namespace: biological_process def: "The process of unidirectional (polarized) transfer of genetic ..." [ISBN:0387520546] subset: gosubset_prok synonym: "mating" BROAD [] is_a: GO:0000746 ! conjugation is_a: GO:0009292 ! genetic transfer relationship: only_in_taxon NCBITaxon_Union:0000004 {id="GOTAX:0000117"} ! Prokaryota

Here's the only non-Prokaryota example I can see:

[Term] id: GO:0009766 name: primary charge separation namespace: biological_process def: "In the photosynthetic reaction centers, primary charge separation is initiated by the excitation of a molecule followed by the transfer of an electron to an electron acceptor molecule f ollowing energy transfer from light harvesting complexes." [ISBN:0792361431] is_a: GO:0022904 ! respiratory electron transport chain relationship: only_in_taxon NCBITaxon_Union:0000007 {id="GOTAX:0000169"} ! Viridiplantae or Bacteria or Euglenozoa relationship: part_of GO:0019684 ! photosynthesis, light reaction

And here's the full OBO file: https://curation.pombase.org/kmr44/go-plus-pombe-only.obo

cmungall commented 8 years ago

I think I know but yes make a ticket thanks

On 30 Sep 2016, at 16:38, Kim Rutherford wrote:

There are a few stanzas with only_in_taxon Prokaryota in the pombe-only go-plus.obo I generated. Most of the other stanzas with only_in_taxon (like all the Vertebrata and Viridiplantae ones) have been removed by owltools. The never_in_taxon constraints seem to be working.

Should I make an owltools issue about this?

Here's an only_in_taxon Prokaryota example:

[Term] id: GO:0009291 name: unidirectional conjugation namespace: biological_process def: "The process of unidirectional (polarized) transfer of genetic ..." [ISBN:0387520546] subset: gosubset_prok synonym: "mating" BROAD [] is_a: GO:0000746 ! conjugation is_a: GO:0009292 ! genetic transfer relationship: only_in_taxon NCBITaxon_Union:0000004 {id="GOTAX:0000117"} ! Prokaryota

Here's the only non-Prokaryota example I can see:

[Term] id: GO:0009766 name: primary charge separation namespace: biological_process def: "In the photosynthetic reaction centers, primary charge separation is initiated by the excitation of a molecule followed by the transfer of an electron to an electron acceptor molecule f ollowing energy transfer from light harvesting complexes." [ISBN:0792361431] is_a: GO:0022904 ! respiratory electron transport chain relationship: only_in_taxon NCBITaxon_Union:0000007 {id="GOTAX:0000169"} ! Viridiplantae or Bacteria or Euglenozoa relationship: part_of GO:0019684 ! photosynthesis, light reaction

And here's the full OBO file: https://curation.pombase.org/kmr44/go-plus-pombe-only.obo

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/pombase/canto/issues/28#issuecomment-250875846

ValWood commented 8 years ago

Kim, I was going to add a note to this ticket about the taxon restriction that confused us. It was a taxon restriction that worked, but that was not in the obo file you were using, so we could not figure out how it was working...can you remember the specific example? I think it was a "never in metazoa"

kimrutherford commented 8 years ago

The term that confused us was GO:0033173 - "calcineurin-NFAT signaling cascade". It doesn't have a never_in_taxon or an only_in_taxon relationship but owltools --make-species-subset manages to exclude it anyway. We couldn't work out how it's configured.

ValWood commented 8 years ago

Oh I think we know what that is. GO:0033173 - "calcineurin-NFAT signaling cascade" has a taxon restriction in GO we have annotations to regulates descendants. These DO not have taxon restrictions int eh OBO file because regulates is not being followed for taxon restrictions

https://github.com/geneontology/go-ontology/issues/12701

However, because you are treating all is_a, part_of and regulates terms (I guess) you are correctly assuming that all descendants are taxon violations, even though these do not have a taxon restriction in the odbo file.

Would that do it?

ValWood commented 8 years ago

so, I still see: "primary charge separation" only in prokaryotes this ticket is "waiting for external change" right?

ValWood commented 7 years ago

isn't this a duplicate of https://github.com/pombase/canto/issues/1340 ? closing, reopen if not...