Closed ValWood closed 2 years ago
I should mention that KIms digging says this comes from: "GO Annotation data set"
but this is not a gene name in our annotation, it's an isoform identifier. ...and I can't see that we have used it in the PomBase GO annotations.
There is also another GO data derived gene Q9P3V0 This appears in the gene list even.when applying the filter for taxon 4896. But it is a human gene?
@kimrutherford
There are 2 genes represented in pombemine for SPCC548.03c.01 SPCC548.03c.02
That should be: SPCC548.03c.1 SPCC548.03c.2
One case where we use these IDs in a place that is mostly gene IDs is the "with" column of the GAF file. For example:
PomBase SPCC1906.03 wtf19 GO:0005737 PMID:32032353 ISS PomBase:SPCC548.03c.1 C wtf meiotic drive antidote Wtf19 protein taxon:4896 20200914 PomBase part_of(CL:0000607)
PomBase SPCC1906.03 wtf19 GO:0005737 PMID:32032353 ISS PomBase:SPCC548.03c.2 C wtf meiotic drive antidote Wtf19 protein taxon:4896 20200914 PomBase part_of(CL:0000607)
Maybe they are being misunderstood as gene identifiers in that context?
Right that makes sense.Hmm this is a real edge case. We can infer the location of the different specific versions of this protein (poison and antidote) , and in this case we have specified the isoform(alternative transcript) ID in the with column.
I checked the docs http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/#with-or-from-column-8 to see if this field is restricted to "gene" and it isn't but isoform is not documented:
I suspect if we discussed this the format would be the same as an allele, so it would be DB:gene_symbol[isoform_symbol]
I will check this with GO
There is also another GO data derived gene Q9P3V0 This appears in the gene list even.when applying the filter for taxon 4896. But it is a human gene?
Did you paste the wrong ID? That one (Q9P3V0) is pombe wtf4. Did you mean Q9H9V9?
Now that I've investigated more this may be a similar problem to SPCC548.03c.1/SPCC548.03c.2
But this time it's a PomBase bug. In the GAF file we are prefixing everything in the "with" column with "PomBase:" so we have:
PomBase:SPAC25H1.02 RO:0002331 GO:0002184 GO_REF:0000050 ECO:0000266 PomBase:Q9H9V9 2007-05-31 PomBase
PomBase:SPAC25H1.02 RO:0002327 GO:0106156 GO_REF:0000050 ECO:0000266 PomBase:Q9H9V9 2007-05-31 PomBase
It's wrong in the GPAD file too. Whoops.
I've made an issue and I'll get to it this week: pombase/pombase-chado/issues/970
But this time it's a PomBase bug. In the GAF file we are prefixing everything in the "with" column with "PomBase:" so we have:
It looks as though we are not prefixing everything (because they usually resolve on the web pages). if we omit the prefix, PomBase must be inferred. I can fixed this issue in the "legacy GO annotation file"
GO ticket https://github.com/geneontology/helpdesk/issues/394
@danielabutano I have taken this ticket and I 'll report back. It might be possible to improve how InterMIne handles this field if IDs can be typed. Note that "protein complex" identifiers can also be used in this field (I am not sure how?)
@danielabutano one thing I did wonder was about the value of adding the genes from the "with" field. The genes of interest will be loaded already from other routes.
OK I have a response from GO. https://github.com/geneontology/helpdesk/issues/394 basically it isn't safe to assume that the IDs in the "with" field refer to genes.
But I think that is OK, we don't need to use these "with field" entries in any queries. They are really arbitrary sources of support for an annotation, but they aren't useful for querying, and therefore probably shouldn't be loaded as independent genes (as long as the string is visible (Prefix plus ID) people can look up the sources if they want to validate a specific annotation).
I wanted to see what the with field output looks like but I can't get a query to output this column. Where are the instructions for this?
Hi @ValWood, this query shows the genes created from the with column
this query is more precise
below the XML if you want to import the it: `
`
Got it. I forgot I need to switch to "GO evidence"
There are 2 genes represented in pombemine for SPCC548.03c.01 SPCC548.03c.02
Neither @kimrutherford or I can figure out where these originate (they are isoform IDs but not separate genes). We can't see where we export these. Are they coming from another source?
thanks v