pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

function process links #137

Closed ValWood closed 3 years ago

ValWood commented 7 years ago

from https://github.com/pombase/website/issues/84 I pulled out all the conversation to do with F-P linking I'm still not sure what needs to happen, but can re-evaluate when necessary. So we don't need to read the entire ticket again, these appear to be the relevant parts:

history Midori: The MF-BP links are in the go. and go-plus. versions of the GO files. Annotations that can be inferred by following MF-BP links should be in the "inferred GAF" files (ours is gene_association.pombase.inf.gaf), so in theory you could get them by loading that .inf.gaf file from GO Jenkins. BUT in practice, something's been wrong for months - see this ticket, which I opened last December: geneontology/go-site#2226

Since we can't rely on the GO Jenkins inference file yet, once you get the ontology links, annotation counting should work - at present I think they're all MF part_of BP (and in the longer term, even if they add any other link types it'll be the MF part_of BP ones that we'll want to use).

So, on to the question of whether we should include MF-BP links ...

I think it would be entirely accurate to follow MF-part_of-BP links, because it would effectively be saying "this gene product is involved in this process, because it does this MF which is always part of the process" - if that's ever not true, there's a problem with the ontology.

What I can't guess as easily is whether it's what users would expect, or whether it would foul up anyone's usage of the counts - or especially the search results - to have BP terms retrieved by a query for MF. For counts on gene pages I suppose we could just make an executive decision, document the hell out of it, and hope for the best. When it comes to search behavior, it might be wise to offer an explicit option to include or exclude inter-ontology retrieval. (And I hope that's technically feasible!)

val: The current situation is that although we don't include F-P links explicitly in PomBase, we haven't needed to because the annotations are automatically materialized by GOC. If this did not happen the counts would not reflect the biology for biological process correctly.

Looking back, last news was that materializing inferred MF-BP annotations is still likely geneontology/go-annotation#1427 so the counts should be largely correct for most terms.

However, we might need to use F-P links because in some cases. I have blocked/filtered display of the some modification process annotations so that they do not appear on the gene pages. For example, "protein phosphorylation" from inferred links is suppressed so that we don't see a redundant "protein phosphorylation" annotation for every "protein kinase activity" annotation, where the "phosphorylation" annotation is not required because it is not telling the user anything about the biological role.

Where "protein phosphorylation" is displayed it is because the term is explicitly annotated. However the counts (and the queries) should include annotations which would be generated via F-P links. I can't figure out if the F-P links are being followed in the current PomBase (counts of queries).....I think probably not. I'll check next time I come across a protein kinase annotation that doesn't not have an explicit phosphorylation annotation.

midori: One of Kim's comments on this ticket suggests that PomBase2 loading/display/etc. isn't doing anything at our end to use MF-BP links yet. I don't know whether it's loading the inferred annotations from GO, but even if they are getting in, that set is woefully incomplete at the moment due to the bug from ticket geneontology/go-site#2226.

The question of whether the gene pages should show BP terms reached via MF-BP links feels like it's off on a tangent from the topic of this ticket. If GO materializes them (and fixes the bug), and we use the .inf.gaf annotations, it's moot.

kim: It's currently getting data only from Chado and there are no MF-BP relationships in Chado. We load Chado from go-plus.obo

midori: The MF-BP links are in the go. and go-plus. versions of the GO files. Could you send me an example so I can track down why they are being stored in Chado? here are three: GO:0008233 peptidase activity part_of GO:0006508 proteolysis GO:0071164 RNA trimethylguanosine synthase activity part_of GO:0036261 7-methylguanosine cap hypermethylation *GO:0005484 SNAP receptor activity part_of GO:0061025 membrane fusion

*for this example, the inferred annotation should be in the inf.gaf but isn't (that bug)

val: f the F_P links are fiddly, put them on the back burner because it only affects the small number of modification process term totals where I have filtered. You might prefer that we do this another way. We could load all of the data (which should make all the queries correct even if F-P links are excluded), and then just flag some terms for the annotation not to appear on the gene pages if this is easier. I would just need to process the list as GO IDs are added for other reasons. One is if there are terms we don't like to anntotate to at all like "cell proliferation" . We can discuss on next call.

midori: The question of whether the gene pages should show BP terms reached via MF-BP links feels like it's off on a tangent from the topic of this ticket. If GO materializes them (and fixes the bug), and we use the .inf.gaf annotations, it's moot.

Val: I don't think it is quite because of some quirky filtering I do (see above). However, for now we won't worry about this. I will check future counts and if there is an issue I'll open a ticket explaining properly.

Midori: Does annotation shown in Amigo2 include descendants found by following MF-BP links? I did a bit of digging to try to answer this q. Of course it's not a simple, convenient "yes" or "no"; it depends on whether you start from a gene- or GO term-centric perspective.

On the page of details for a gene (product), it only shows directly annotated terms in the main display table. That set of direct annotations will include ones derived by following MF-BP links*, though, because the Jenkins system generates direct annotations and we incorporate them (that's the inf.gaf file I've been banging on about).

There's also a link in the left-hand sidebar to see all terms annotated by transitivity, which follows all is_a, part_of, and regulates links, including MF-part_of-BP. The annotation download thingy offers a somewhat unintuitive set of options that look like they can include or exclude annotations inferred from MF-BP even if they aren't available as 'direct' annotations via the .inf.gaf route, but I didn't try testing it.

It's slightly different if I start with a GO term - the results include a "genes and gene products" link, and that page has a table that does include genes annotated transitively via MF-BP. [I was able to test this because of that GO bug (https://github.com/geneontology/go-site/issues/2226), so I guess it's had a bit of a silver lining.]

The bottom line is that AmiGO does appear to follow MF-BP links in some cases, so there must be code in there somewhere to do it.

mah11 commented 7 years ago

Does this even belong on the website tracker?

ValWood commented 7 years ago

Probably not. It can sit here for now, it's tangentially related to website (probably should be on chado tracker?). I'm not sure what the action is, but I want to keep an eye on F-P inferences we pull in from GO, so I'll take the ticket and migrate if required once I have done checks...

ValWood commented 7 years ago

I need to follow up on this, on my list

ValWood commented 6 years ago

might be related? https://github.com/pombase/website/issues/485

ValWood commented 3 years ago

We don't need this ticket.

ValWood commented 3 years ago

will update https://github.com/pombase/website/issues/485