intermine / pombemine

0 stars 1 forks source link

Gene represented differently in differently query outputs (SPBC28F2.11) #50

Closed ValWood closed 2 years ago

ValWood commented 2 years ago

In this query output:

PM_query

I see

PM_output_no_value

We have figured out where all of these genes with no "Gene Name" and no "feature type" are coming from except for SPAC28F2.11. Most are for gene IDs which are now synonyms of non-coding RNAs, or have been deprecated.

However SPAC28F2.11, is extant, and is also correctly represented here (with a symbol and a feature type)

Screenshot 2022-05-23 at 11 30 48

Can we trace the origin of the one with "NO VALUE"

ValWood commented 2 years ago

@kimrutherford could you double check this, because as far as I am aware pombeMIne are not importing genes from another source?

kimrutherford commented 2 years ago

could you double check this,

I think I've tracked this down. I worked out that "cerevisiae-orthologs data set" is: https://www.pombase.org/ftp/pombe/orthologs/cerevisiae-orthologs.txt (obvious now I look at it!)

and that the "SPBC28F2.11" in the file has a space after it. So my bet is that in PombeMine there is a gene with ID "SPBC28F2.11" with all the data and a gene with ID "SPBC28F2.11 " that has no length. I'll fix the ortholog file in SVN. The file on the website/ftp site will be updated overnight.

because as far as I am aware pombeMIne are not importing genes from another source?

Genes are loaded from "BioGRID interaction data set" too, unless that's actually the PomBase interaction data set. It's been too long so I can't remember where the configuration lives for InterMine so I can't check.

ValWood commented 2 years ago

OK I will assume this one will be fixed and close.