Open kkaris opened 3 months ago
Could you check which resources specifically are implicated?
The one I saw in my timeout had to do with ec-codes:
INFO: [2024-06-03 18:15:51] pystow.utils - downloading with urllib from ftp://ftp.expasy.org/databases/enzyme/enzclass.txt to /root/.data/pyobo/raw/eccode/2024-05-29/enzclass.txt
INFO: [2024-06-03 18:15:53] pystow.utils - downloading with urllib from ftp://ftp.expasy.org/databases/enzyme/enzyme.dat to /root/.data/pyobo/raw/eccode/2024-05-29/enzyme.dat
INFO: [2024-06-03 18:15:55] pystow.utils - downloading with urllib from http://current.geneontology.org/ontology/external2go/ec2go to /root/.data/pyobo/raw/eccode/2024-05-29/ec2go.tsv
I'll get a list of all resources that are implicated.
I'm excluding pyobo calls that are in processors, as they are not used when serving the rest api for the discovery apps. I found two instances:
metabolite/discrete
endpoint/gene/continuouos/
endpoint via ContinuousForm.get_scores()
-> get_rat/mouse/human_scores()
-> _get_species_scores
Re the EC-codes: The HGNCEnzymeProcessor
actually uses the bioontology, so we could either:
client/enrichment/mla.py
to bio-ontology calls instead orHGNCEnzymeProcessor
to pyobo callsclient/enrichment/mla.py
by modifying the query to get the name as well from the node~I think option 3 makes the most sense, then we always stay consistent with the data in the database.~ There are two use cases there: a) Getting names from ec-codes that exists in CoGEx and b) Getting hgnc ids, translate them to ec-codes, then get the name. In a) we can replace the lookup by simply querying for the name as well, but in b) we still need to get the name. In this case I think option 2 above is the way to go, since that's what we use to create the DB
I recently ran into a timeout when testing one of the frontend apps at discovery.indra.bio and saw on the backend that the issue was that there were new files being downloaded for pyobo. To resolve this, we can add version pins for various pyobo calls wherever they show up so that there are no downloads triggered at runtime when calls to the various apps come in.
See also: https://github.com/biopragmatics/pyobo/pull/181 and https://github.com/biopragmatics/pyobo/pull/184.