Closed vincerubinetti closed 1 year ago
@falquaddoomi Reminder to dump the Pseudomonas data from Tribe, and you can paste it here.
I think querying mygeneset.info by organism would be ideal - is that out of the question? I think they were just pickled because tribe wasn't responsive enough to return them live.
Do you mean mygene.info? Mygeneset.info doesn't seem to return any genesets for "pseudomonas aeruginosa".
The mygene.info query: https://mygene.info/v3/query?q=taxid:287&size=1000 (I think that's the right taxon id?)
Also does pickled in this case mean hardcoded? Changing it from hardcoded to a live query might change results that users may have gotten used to? Might break some tests too, might have to update the test fixture data.
Pickled essentially means hardcoded. I think having it be live query results is better. I didn't realize there were no genesets over at mygeneset.info for Pseudomonas - maybe we can see why things aren't turning up?
It looks like GO maps to PAO1
I'm pretty sure this is the taxonomy ID 208964 : https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=208964&lvl=3&lin=f&keep=1&srchmode=1&unlock
However, I'm not finding genesets: https://mygeneset.info/v1/query?species=208964&size=10&fields=all&always_list=genes%2Cgenes.alias%2Cgenes.entrezgene%2Cgenes.symbol%2Cgenes.ensembl%2Cgenes.ensemblgene%2Cgenes.uniprot
for the GO data source, mygeneset.info loads only a set of species below:
Can you tell if any other annotation files we might also include here:
http://current.geneontology.org/annotations/
P.S. You can see the list of taxids/species we support in mygeneset.info via this query:
https://mygeneset.info/v1/query?size=0&aggs=taxid&facet_size=100
Ahh! Can you add http://current.geneontology.org/annotations/pseudocap.gaf.gz ?
Glad to see that these are going to get loaded into mygeneset.info! In the interim, here's an archive containing both the original pickled geneset for Pseudomonas aeruginosa as well as the result of hitting the Adage backend's endpoint with that organism as the argument.
Specifically, the archive contains the following files:
Pseudomonas_aeruginosa_pickled_genesets
: pulled from the Adage backend API's ./data
folder. This is a pickled data structure that's loaded and returned with some minor postprocessing by the return_unpickled_genesets
Adage API endpoint.return_unpickled_genesets - Pseudomonas aeruginosa.json
: returned from the URL https://api-adage.greenelab.com/api/v1/tribe_client/return_unpickled_genesets?organism=Pseudomonas aeruginosa. The implementation of that endpoint and the structure of the returned JSON can be found here: https://github.com/greenelab/adage-backend/blob/master/adage/tribe_client/views.py#L236
Tribe is scheduled to be shutdown on May 1st. This app relies on one query to Tribe to get pickled genesets. Since the query is only based on organism, and there is only one organism (in the built-in models), we can simply hard code the list of pickled geneset. We discovered with Ben Heil's mousiplier that this code is unfortunately pretty hard-coded already to these specific models/datasets, so we might as well.
https://github.com/greenelab/adage-frontend/blob/master/src/backend/signatures.js#L50-L57
Perhaps querying mygene.info for genes by organism would satisfy this? I'm not even sure what the biological significance of "pickled" is in this case (or in any case). Maybe @cgreene could answer? Otherwise I'm happy to just hard code.