greenelab / adage-frontend

The Adage web app, a tool to explore gene expression data and discover new insights from machine learning models
https://adage.greenelab.com
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Deprecate Tribe #236

Closed vincerubinetti closed 1 year ago

vincerubinetti commented 1 year ago

Tribe is scheduled to be shutdown on May 1st. This app relies on one query to Tribe to get pickled genesets. Since the query is only based on organism, and there is only one organism (in the built-in models), we can simply hard code the list of pickled geneset. We discovered with Ben Heil's mousiplier that this code is unfortunately pretty hard-coded already to these specific models/datasets, so we might as well.

https://github.com/greenelab/adage-frontend/blob/master/src/backend/signatures.js#L50-L57

Perhaps querying mygene.info for genes by organism would satisfy this? I'm not even sure what the biological significance of "pickled" is in this case (or in any case). Maybe @cgreene could answer? Otherwise I'm happy to just hard code.

vincerubinetti commented 1 year ago

@falquaddoomi Reminder to dump the Pseudomonas data from Tribe, and you can paste it here.

cgreene commented 1 year ago

I think querying mygeneset.info by organism would be ideal - is that out of the question? I think they were just pickled because tribe wasn't responsive enough to return them live.

vincerubinetti commented 1 year ago

Do you mean mygene.info? Mygeneset.info doesn't seem to return any genesets for "pseudomonas aeruginosa".

The mygene.info query: https://mygene.info/v3/query?q=taxid:287&size=1000 (I think that's the right taxon id?)

Also does pickled in this case mean hardcoded? Changing it from hardcoded to a live query might change results that users may have gotten used to? Might break some tests too, might have to update the test fixture data.

cgreene commented 1 year ago

Pickled essentially means hardcoded. I think having it be live query results is better. I didn't realize there were no genesets over at mygeneset.info for Pseudomonas - maybe we can see why things aren't turning up?

cgreene commented 1 year ago

It looks like GO maps to PAO1 Screenshot 2023-02-02 at 4 08 36 PM

I'm pretty sure this is the taxonomy ID 208964 : https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=208964&lvl=3&lin=f&keep=1&srchmode=1&unlock

However, I'm not finding genesets: https://mygeneset.info/v1/query?species=208964&size=10&fields=all&always_list=genes%2Cgenes.alias%2Cgenes.entrezgene%2Cgenes.symbol%2Cgenes.ensembl%2Cgenes.ensemblgene%2Cgenes.uniprot

newgene commented 1 year ago

for the GO data source, mygeneset.info loads only a set of species below:

https://github.com/biothings/mygeneset.info/blob/e14100ed5b4f52cfea6b7b1a02fda0b92da0eb58/src/plugins/go/manifest.json#L9-L13

Can you tell if any other annotation files we might also include here:

http://current.geneontology.org/annotations/

P.S. You can see the list of taxids/species we support in mygeneset.info via this query:

https://mygeneset.info/v1/query?size=0&aggs=taxid&facet_size=100

cgreene commented 1 year ago

Ahh! Can you add http://current.geneontology.org/annotations/pseudocap.gaf.gz ?

falquaddoomi commented 1 year ago

Glad to see that these are going to get loaded into mygeneset.info! In the interim, here's an archive containing both the original pickled geneset for Pseudomonas aeruginosa as well as the result of hitting the Adage backend's endpoint with that organism as the argument.

Specifically, the archive contains the following files:

Pseudomonas_aeruginosa_tribe_genesets.zip