saezlab / decoupler-py

Python package to perform enrichment analysis from omics data.
https://decoupler-py.readthedocs.io/
GNU General Public License v3.0
145 stars 21 forks source link

Error while translating human to mouse MSigDB #109

Closed wbrett87 closed 2 months ago

wbrett87 commented 3 months ago

Describe the bug When I try to translate the human MSigDB network to mouse I get the following error (screenshot below):

image

To Reproduce I followed the tutorial on the Read the Docs.

System

wbrett87 commented 3 months ago

If I look further down the traceback, it looks like a problem with gzip.py. I will send a pic of the rest of the traceback when I get to my work computer.

arntetou commented 3 months ago

Hi @deeenes and @PauBadiaM

thanks a lot for the innovative bioinformatics tools generated in your lab!

I am planning to use decoupler for geneset enrichment analysis of my mouse and human datasets and currently trying unsuccessfully to reproduce the homology convertion using Decoupler 1.6.0 following the guidelines in the following link: https://decoupler-py.readthedocs.io/en/latest/notebooks/translate.html

The following command resulted into an error: Command:

# Translate targets
mouse_msigdb = dc.translate_net(msigdb, target_organism = 'mouse', unique_by = ('geneset', 'genesymbol'))

Error message:

 File "/home/arts10/.conda/envs/decoupler/lib/python3.12/site-packages/pypath/utils/mapping.py", line 258, in __init__
    self.load()
  File "/home/arts10/.conda/envs/decoupler/lib/python3.12/site-packages/pypath/utils/mapping.py", line 288, in load
    self.read()
  File "/home/arts10/.conda/envs/decoupler/lib/python3.12/site-packages/pypath/utils/mapping.py", line 450, in read
    getattr(self, method)()
  File "/home/arts10/.conda/envs/decoupler/lib/python3.12/site-packages/pypath/utils/mapping.py", line 561, in read_mapping_file
    for i, line in enumerate(infile):
  File "/home/arts10/.conda/envs/decoupler/lib/python3.12/site-packages/pypath/inputs/uniprot.py", line 353, in get_uniprot_sec
    for i, line in enumerate(c.result):
                   ^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not iterable 

I tried to downgrade Pypath-omnipath but it didn't solve the problem.

Would you please help me with recommendations on how to solve the issue and start applying this great in my analysis pipeline?

Thank you so much in advance for your help.

Best,

arntetou

deeenes commented 2 months ago

Hi @wbrett87 & @arntetou, These are two different errors, both of them looks like accidental download failures (this can happen due to bad network connections or other random reasons). I recommend to try again, considering our advices for such situations. Also it's good to make sure pypath (pypath-omnipath) is the latest version. @wbrett87 - no need to take screenshots of tracebacks, it's better to copy-paste here the whole traceback as text. If the error persists, please include not only the traceback, but also the pypath log.

arntetou commented 2 months ago

Hi @deeenes, thank you very much for your prompt reply. I tried it again and also considering your advices (https://pypath.omnipathdb.org/notebooks/manual.html#Download-failures) with the same error. Following are the last lines of the pypath log file: "[2024-04-07 20:32:10] [taxonomy] Could not map to NCBI Taxonomy ID: Septoria tritici. [2024-04-07 20:32:10] [taxonomy] Could not map to NCBI Taxonomy ID: Septoria tritici. [2024-04-07 20:32:10] [taxonomy] Could not map to NCBI Taxonomy ID: Septoria tritici. [2024-04-07 20:32:10] [taxonomy] Could not map to NCBI Taxonomy ID: Septoria tritici. [2024-04-07 20:32:10] [taxonomy] Could not map to NCBI Taxonomy ID: Septoria tritici. [2024-04-07 20:32:10] [orthology] Loading orthology data from OMA between organisms 9606 and 10090. [2024-04-07 20:32:10] [curl] Creating Curl object to retrieve data from https://omabrowser.org/api/pairs/9606/10090/?page=1&per_page=1000 [2024-04-07 20:32:10] [curl] Cache file path: /home/arts10/.cache/pypath/62d730371f984c0d205b4529f9239c81- [2024-04-07 20:32:10] [curl] Cache file found, no need for download. [2024-04-07 20:32:10] [curl] Loading data from cache previously downloaded from omabrowser.org [2024-04-07 20:32:10] [curl] Opening file /home/arts10/.cache/pypath/62d730371f984c0d205b4529f9239c81- [2024-04-07 20:32:10] [curl] Extracting data from file type plain [2024-04-07 20:32:10] [curl] Opening plain text file /home/arts10/.cache/pypath/62d730371f984c0d205b4529f9239c81-. [2024-04-07 20:32:10] [curl] Contents of /home/arts10/.cache/pypath/62d730371f984c0d205b4529f9239c81- has been read and the file has been closed. [2024-04-07 20:32:10] [curl] File at https://omabrowser.org/api/pairs/9606/10090/?page=1&per_page=1000 successfully retrieved. Resulted file type plain text, unicode string. Local file at /home/arts10/.cache/pypath/62d730371f984c0d205b4529f9239c81-. [2024-04-07 20:32:10] [mapping] Requested to load ID translation table from uniprot-sec to uniprot-pri, organism: 9606. [2024-04-07 20:32:10] [mapping] Chosen built-in defined ID translation table: resource=basic, id_type_a=uniprot-sec, id_type_b=uniprot-pri [2024-04-07 20:32:10] [inputs] Selecting input method (step 1): module uniprot.get_uniprot_sec, method None. [2024-04-07 20:32:10] [inputs] Selecting input method (step 2): module pypath.inputs.uniprot, method get_uniprot_sec. [2024-04-07 20:32:10] [inputs] Importing module pypath.inputs.uniprot. [2024-04-07 20:32:10] [mapping] Loading mapping table for organism 9606 with identifiers uniprot-sec and uniprot-pri, input type file [2024-04-07 20:32:10] [mapping] Reader created for ID translation table, parameters: ncbi_tax_id=9606, id_a=uniprot-sec, id_b=uniprot-pri, load_a_to_b=1, load_b_to_a=0, input_type=file (FileMapping). [2024-04-07 20:32:10] [inputs] Selecting input method (step 1): module uniprot.get_uniprot_sec, method None. [2024-04-07 20:32:10] [inputs] Selecting input method (step 2): module pypath.inputs.uniprot, method get_uniprot_sec. [2024-04-07 20:32:10] [inputs] Importing module pypath.inputs.uniprot. [2024-04-07 20:32:10] [curl] Creating Curl object to retrieve data from ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/docs/sec_ac.txt [2024-04-07 20:32:10] [curl] Cache file path: /home/arts10/.cache/pypath/7814fe9dc734379a8c28d4b1478d2f85-sec_ac.txt [2024-04-07 20:32:10] [curl] Setting up and calling pycurl. [2024-04-07 20:32:20] [curl] CURL DEBUG INFO: ERROR [2024-04-07 20:32:20] [curl] PycURL error: (28, 'Connection timeout after 10001 ms') [2024-04-07 20:32:30] [curl] CURL DEBUG INFO: ERROR [2024-04-07 20:32:30] [curl] PycURL error: (28, 'Connection timeout after 10001 ms') [2024-04-07 20:32:40] [curl] CURL DEBUG INFO: ERROR [2024-04-07 20:32:40] [curl] PycURL error: (28, 'Connection timeout after 10001 ms') [2024-04-07 20:32:40] [curl] Download error: HTTP 500 [2024-04-07 20:32:40] [curl] Download error: empty file retrieved. [2024-04-07 20:32:40] [curl] First 5000 bytes of response: [2024-04-07 20:32:40] [curl] Download failed, removing the resulted file. [2024-04-07 20:32:40] [curl] Removing file: /home/arts10/.cache/pypath/7814fe9dc734379a8c28d4b1478d2f85-sec_ac.txt " It seems like it is a download failure as you mentioned above. How could I please overcome this issue?

Thanks in advance for your help.

Best,

arntetou

deeenes commented 2 months ago

The URL it fails to download is ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/docs/sec_ac.txt, and the connection times out after 10 sec, which is a very long time. The URL is not bad, I've just downloaded it. Can you download the URL simply by curl?

curl -LO ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/docs/sec_ac.txt

Finally, the same content is available by HTTP, so I updated the URL in saezlab/pypath@29aa404. You can reinstall pypath and try again:

pip install git+https://github.com/saezlab/pypath.git
arntetou commented 2 months ago

Hi @deeenes

Fantastic! It worked! Thanks a lot for your help.

Best,

arntetou