japonicusdb / japonicus-curation

Data files for JaponicusDB
0 stars 1 forks source link

Include PAINT annotations for S. japonicus #57

Closed ValWood closed 7 months ago

ValWood commented 1 year ago

Japonicus is now included in PANTHER (this probably happened a while ago, I hadn't checked).

This means that S. japonicus is now a species which is "PAINTED" and supplies experimenta annotations for transfer from the PAINT project.

There are now 10, 878 PAINTED GO annotations for S. japonicus https://www.ebi.ac.uk/QuickGO/annotations?taxonId=4897&taxonUsage=descendants&evidenceCode=ECO:0000318&evidenceCodeUsage=descendants

Most, if not all of these should be covered by our IEA pipeline. However direct transfer from fission yeast with an IEA (electronic) evidence code loses some of the provenance and isn't as good as an IBA (inferred from biological ancestor) annotations.

So, could we, a) import the IBAs for S. japonicus and b) give IBA precedence over IEA for japonicus.

(I think we said in the paper that we planned to do this)

CC @snezhkaoliferenko
also congrats to you and Gugs on your "amazeballs" Nat Comm. peroxisomal compartmentation paper!

kimrutherford commented 1 year ago

give IBA precedence over IEA for japonicus.

So different to pombe?

Here's the priority list we use for PomBase: https://github.com/pombase/pombase-chado/issues/695#issuecomment-709565296

ValWood commented 1 year ago

I dropped the ball on that too. We can switch the order for pombe. I made a ticket https://github.com/pombase/pombase-chado/issues/1112

kimrutherford commented 1 year ago

For PomBase we load PANTHER annotations from: http://snapshot.geneontology.org/annotations/pombase.gaf.gz (I think) but the japonicus GAF (http://snapshot.geneontology.org/annotations/japonicusdb.gaf.gz) file doesn't seem to contain any annotations that look like they come from PANTHER.

Am I looking in the right place?

ValWood commented 1 year ago

Hmm, I don't know. I know they exist because I can see them in GOA. @pgaudet do you know why the japonicus IBA's would not be here, but are in GOA?

pgaudet commented 1 year ago

Not sure why they are not being exported.

@kltm do you know ?

kimrutherford commented 1 year ago

Hmm, I don't know. I know they exist because I can see them in GOA.

I checked the most recent GOA GAF file and there are 10878 japonicus annotations that look like they are from PANTHER. Should we load those?

pgaudet commented 1 year ago

Where ? I dont see IBAs here http://snapshot.geneontology.org/annotations/japonicusdb.gaf.gz maybe you have a different file?

kimrutherford commented 1 year ago

I was looking at: https://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz

pgaudet commented 1 year ago

Ah !! great ! I am not involved with the GOA pipeline, so I didn't expect to check there.

If this meets your imemdiate need great ! I'll still check why GO central doesn't have these annotations.

ValWood commented 1 year ago

@kimrutherford are the pombe ones in the equivalent snapshot file?

If they disappear we might not notice because we would (presumably) get the PAINT annotations from the GOA load as a fall back?

kimrutherford commented 1 year ago

are the pombe ones in the equivalent snapshot file?

Yep!

We load them from the snapshot rather than the GOA GAF so they are more up to date. (I think that's the reason)

If they disappear we might not notice because we would (presumably) get the PAINT annotations from the GOA load as a fall back?

Currently we don't load the PAINT annotation from the GOA file so if they disappear from the snapshot file we wouldn't have any PAINT annotations.

ValWood commented 1 year ago

OK then we would notice! Will wait for @kltm for why the japonicus are missing.

kltm commented 1 year ago

We are talking about PAINT-generated IBAs, correct?

I'd note that they were never added as a PAINT species during any update. See current sets in http://data.pantherdb.org/ftp/downloads/paint/17.0/2023-06-05/presubmission/ and lack of an entry in go-site metadata paint.yaml. That said, they do occur in files loaded into AmiGO, namely paint_other.gaf; we have:

sjcarbon@moiraine:/tmp$:) zgrep "taxon:4897" paint_other.gaf.gz | wc -l
0
sjcarbon@moiraine:/tmp$:) zgrep "taxon:402676" paint_other.gaf.gz | wc -l
11227

Looking around in AmiGO, we have: https://amigo.geneontology.org/amigo/search/annotation?q=SJAG_02056&sfq=document_category:%22annotation%22 and more generally: https://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=taxon_subset_closure_label:%22Schizosaccharomyces%20japonicus%20yFS275%22&

There seems to be no taxon overlap with what we are getting from JaponicusDB directly from their GAF: https://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=taxon_subset_closure_label:%22Schizosaccharomyces%20japonicus%22&sfq=document_category:%22annotation%22

Essentially, PAINT here is taxon:402676 and the JaponicusDB GAF is taxon:4897. Perhaps this is the source of confusion?

Tagging on @dustine32 just in case

ValWood commented 1 year ago

The species taxon ID is 4897 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=4897

402676 is a strain ID https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=402676

For a while NCBI added strain IDs. They no longer do this. I think the Panther/PAINT annotation should probably migrate to the species level (We annotate GO at the species level, not the strain).

Who do we need to inform, @huaiyumi ?

dustine32 commented 1 year ago

As @kltm pointed out, PAINT is releasing the IBAs for japonicus in paint_other.gaf rather than in its own paint_japonicus.gaf file. PomBase could extract these japonicus IBAs out of paint_other or we may consider creating the separate paint_japonicus.gaf along with the appropriate go-site/metadata/datasets/paint.yaml entry containing a merges_into: japonicusdb property.

The species taxon vs strain taxon conversation came up for pombe a while ago but I can't find the issue(s) or email thread with this discussion, unfortunately. PANTHER/PAINT uses strain ID 402676 for japonicus simply because that is the taxon that's tied to the Reference Proteome sequence data we use to build the trees. @huaiyumi Do you recall this discussion?

ValWood commented 1 year ago

Is it possible to switch it to the species taxon? It would make more sense to be consistent (and I think for most orgs we use the species?)

huaiyumi commented 1 year ago

I wonder if this has something to do with the Reference Proteome. We got the S. japonicus data from them. It is probably not a simple switch of taxon ID. The sequence IDs maybe different under these two difference taxons. I guess the best way to fix this is to work with the Reference Proteome.

ValWood commented 1 year ago

GO annotation should be at the species level though right @pgaudet ?

pgaudet commented 11 months ago

@ValWood Did Maria answer the question?

This is not really a GO problem, but it's to Reference Proteomes and Japonicus to agree on the reference strain.

ValWood commented 11 months ago

I'm guessing it will be resolved by PANTHER mapping the annotations over from the strain ID to the species ID. val