rpetit3 / fastq-dl

Download FASTQ files from SRA or ENA repositories.
MIT License
268 stars 24 forks source link

Inconsistent exit codes #26

Open mortunco opened 1 month ago

mortunco commented 1 month ago


Great tool. Super useful. Especially one SRX deal with all SRRs is a life saver. Many thanks.

I have two scRNAseq datasets. I shared the examples below. I am strictly running my fastq-dl with --only-provider to get fastq.gz . These two samples dont have fastq.gz therefore technically both runs should exit with 1. But one finishes with exist status 0 (the top case) and the other exits with 1. I am just curious how these two samples are different so they are giving different outputs.

$ fastq-dl -a ERX5847526 --outdir raw-data/bodenmiller-36609566/bodenmiller-36609566-43739961/sequences/ --only-provider --verbose --cpus 2 > "raw-data/bodenmiller-36609566/bodenmiller-36609566-43739961/log/download_ENA.txt"'  && echo $?
2024-06-04 16:59:50 DEBUG    2024-06-04 16:59:50:root:DEBUG - Querying for metadata (Attempt 1 of 10)                                                                                                                                                                                                                                                                                                                                                                           fastq_dl.py:473
                    DEBUG    2024-06-04 16:59:50:root:DEBUG - --only-provider supplied, limiting queries to ena                                                                                                                                                                                                                                                                                                                                                                 fastq_dl.py:476
                    DEBUG    2024-06-04 16:59:50:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): www.ebi.ac.uk:443                                                                                                                                                                                                                                                                                                                                     connectionpool.py:1055
                    DEBUG    2024-06-04 16:59:50:urllib3.connectionpool:DEBUG - https://www.ebi.ac.uk:443 "GET /ena/portal/api/search?result=read_run&format=tsv&query=%22experiment_accession=ERX5847526%22&fields=all HTTP/1.1" 200 None                                                                                                                                                                                                                                connectionpool.py:549
                    INFO     2024-06-04 16:59:50:root:INFO - Query: ERX5847526                                                                                                                                                                                                                                                                                                                                                                                                  fastq_dl.py:723
                    INFO     2024-06-04 16:59:50:root:INFO - Archive: ena                                                                                                                                                                                                                                                                                                                                                                                                       fastq_dl.py:724
                    INFO     2024-06-04 16:59:50:root:INFO - Total Runs To Download: 2                                                                                                                                                                                                                                                                                                                                                                                          fastq_dl.py:729
                    INFO     2024-06-04 16:59:50:root:INFO -         Working on run ERR6212414...                                                                                                                                                                                                                                                                                                                                                                               fastq_dl.py:748
                    ERROR    2024-06-04 16:59:50:root:ERROR -        No fastqs found in ENA for ERR6212414                                                                                                                                                                                                                                                                                                                                                                      fastq_dl.py:761
                    INFO     2024-06-04 16:59:50:root:INFO -         Working on run ERR6212415...                                                                                                                                                                                                                                                                                                                                                                               fastq_dl.py:748
                    ERROR    2024-06-04 16:59:50:root:ERROR -        No fastqs found in ENA for ERR6212415                                                                                                                                                                                                                                                                                                                                                                      fastq_dl.py:761
                    INFO     2024-06-04 16:59:50:root:INFO - Writing metadata to raw-data/bodenmiller-36609566/bodenmiller-36609566-43739961/sequences//fastq-run-info.tsv                                                                                                                                                                                                                                                                                                      fastq_dl.py:844
$ fastq-dl -a SRX12493302 --outdir raw-data/aifantis-36581735/aifantis-36581735-9d14387e/sequences/ --only-provider --verbose --cpus 8'
2024-06-04 17:02:31 DEBUG    2024-06-04 17:02:31:root:DEBUG - Querying for metadata (Attempt 1 of 10)                                                                                                                                                                                                                                                                                                                                                                           fastq_dl.py:473
                    DEBUG    2024-06-04 17:02:31:root:DEBUG - --only-provider supplied, limiting queries to ena                                                                                                                                                                                                                                                                                                                                                                 fastq_dl.py:476
                    DEBUG    2024-06-04 17:02:31:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): www.ebi.ac.uk:443                                                                                                                                                                                                                                                                                                                                     connectionpool.py:1055
2024-06-04 17:02:32 DEBUG    2024-06-04 17:02:32:urllib3.connectionpool:DEBUG - https://www.ebi.ac.uk:443 "GET /ena/portal/api/search?result=read_run&format=tsv&query=%22experiment_accession=SRX12493302%22&fields=all HTTP/1.1" 200 7559                                                                                                                                                                                                                               connectionpool.py:549
                    INFO     2024-06-04 17:02:32:root:INFO - Query: SRX12493302                                                                                                                                                                                                                                                                                                                                                                                                 fastq_dl.py:723
                    INFO     2024-06-04 17:02:32:root:INFO - Archive: ena                                                                                                                                                                                                                                                                                                                                                                                                       fastq_dl.py:724
                    INFO     2024-06-04 17:02:32:root:INFO - Total Runs To Download: 1                                                                                                                                                                                                                                                                                                                                                                                          fastq_dl.py:729
                    INFO     2024-06-04 17:02:32:root:INFO -         Working on run SRR16208970...                                                                                                                                                                                                                                                                                                                                                                              fastq_dl.py:748
                    ERROR    2024-06-04 17:02:32:root:ERROR -        No fastqs found in ENA for SRR16208970                                                                                                                                                                                                                                                                                                                                                                     fastq_dl.py:761
                    INFO     2024-06-04 17:02:32:root:INFO - Writing metadata to raw-data/aifantis-36581735/aifantis-36581735-9d14387e/sequences//fastq-run-info.tsv                                                                                                                                                                                                                                                                                                            fastq_dl.py:844
Traceback (most recent call last):
  File "/home/ec2-user/sc-ftdb/conda/envs/download/bin/fastq-dl", line 10, in <module>
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/fastq_dl/fastq_dl.py", line 852, in main
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/fastq_dl/fastq_dl.py", line 845, in fastqdl
    write_tsv(ena_data, f"{outdir}/{prefix}-run-info.tsv")
  File "/home/ec2-user/sc-ftdb/conda/envs/download/lib/python3.10/site-packages/fastq_dl/fastq_dl.py", line 544, in write_tsv
    with open(output, "w") as fh:
FileNotFoundError: [Errno 2] No such file or directory: 'raw-data/aifantis-36581735/aifantis-36581735-9d14387e/sequences//fastq-run-info.tsv'
$ echo $?
$ fastq-dl --version
fastq-dl, version 2.0.4

$ uname -a
Linux XXX.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May  7 11:11:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Any idea what could be different?

Thanks in advance, Best, T.

rpetit3 commented 1 month ago

Hi @mortunco

I think you might have found a fun bug. Do you know if the values existed for --outdir in each of these situations?

Also, can you try running the second command with --only-download-metadata? I'm also going to replicate on my end to see why one works and the other doesn't.

Cheers, Robert

mortunco commented 1 month ago
$ fastq-dl -a ERX5847526 --outdir case_1/ --only-provider --verbose --cpus 2 --only-download-metadata
2024-06-05 20:45:53 DEBUG    2024-06-05 20:45:53:root:DEBUG - Querying for metadata (Attempt 1 of 10)                                                                                                                         fastq_dl.py:473
                    DEBUG    2024-06-05 20:45:53:root:DEBUG - --only-provider supplied, limiting queries to ena                                                                                                               fastq_dl.py:476
                    DEBUG    2024-06-05 20:45:53:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): www.ebi.ac.uk:443                                                                                   connectionpool.py:1055
2024-06-05 20:45:55 DEBUG    2024-06-05 20:45:55:urllib3.connectionpool:DEBUG - https://www.ebi.ac.uk:443 "GET /ena/portal/api/search?result=read_run&format=tsv&query=%22experiment_accession=ERX5847526%22&fields=all connectionpool.py:549
                             HTTP/1.1" 200 None
                    INFO     2024-06-05 20:45:55:root:INFO - Query: ERX5847526                                                                                                                                                fastq_dl.py:723
                    INFO     2024-06-05 20:45:55:root:INFO - Archive: ena                                                                                                                                                     fastq_dl.py:724
                    INFO     2024-06-05 20:45:55:root:INFO - Total Runs Found: 2                                                                                                                                              fastq_dl.py:726
                    DEBUG    2024-06-05 20:45:55:root:DEBUG - --only-download-metadata used, skipping FASTQ downloads                                                                                                         fastq_dl.py:727
                    INFO     2024-06-05 20:45:55:root:INFO - Writing metadata to case_1//fastq-run-info.tsv
$ fastq-dl -a SRX12493302 --outdir case_2/ --only-provider --verbose --cpus 2 --only-download-metadata
2024-06-05 20:46:15 DEBUG    2024-06-05 20:46:15:root:DEBUG - Querying for metadata (Attempt 1 of 10)                                                                                                                         fastq_dl.py:473
                    DEBUG    2024-06-05 20:46:15:root:DEBUG - --only-provider supplied, limiting queries to ena                                                                                                               fastq_dl.py:476
                    DEBUG    2024-06-05 20:46:15:urllib3.connectionpool:DEBUG - Starting new HTTPS connection (1): www.ebi.ac.uk:443                                                                                   connectionpool.py:1055
2024-06-05 20:46:17 DEBUG    2024-06-05 20:46:17:urllib3.connectionpool:DEBUG - https://www.ebi.ac.uk:443 "GET                                                                                                          connectionpool.py:549
                             /ena/portal/api/search?result=read_run&format=tsv&query=%22experiment_accession=SRX12493302%22&fields=all HTTP/1.1" 200 7559
                    INFO     2024-06-05 20:46:17:root:INFO - Query: SRX12493302                                                                                                                                               fastq_dl.py:723
                    INFO     2024-06-05 20:46:17:root:INFO - Archive: ena                                                                                                                                                     fastq_dl.py:724
                    INFO     2024-06-05 20:46:17:root:INFO - Total Runs Found: 1                                                                                                                                              fastq_dl.py:726
                    DEBUG    2024-06-05 20:46:17:root:DEBUG - --only-download-metadata used, skipping FASTQ downloads                                                                                                         fastq_dl.py:727
                    INFO     2024-06-05 20:46:17:root:INFO - Writing metadata to case_2//fastq-run-info.tsv                                                                                                                   fastq_dl.py:736


I shared the output fastq-run-info.tsv

My ignorant idea is maybe --only-provider is causing one to fail and other one fail ?