rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
265 stars 33 forks source link

parallel-fastq-dump not working any more #47

Closed guandailu closed 1 year ago

guandailu commented 2 years ago

Recently I found parallel-fastq-dump is not working. I install the recent version from the conda.

rvalieris commented 2 years ago

hello,

please give me more details, command line, error messages, SRA ids you tried, etc.

guandailu commented 1 year ago

I install the tool using conda, it works before. Now it seems to have some issues. My command is: parallel-fastq-dump --sra-id SRR10024973 --threads 4 --outdir out/ --split-files --gzip The error is below: 2022-07-26 14:24:00,050 - SRR ids: ['SRR10024973'] 2022-07-26 14:24:00,051 - extra args: ['--split-files', '--gzip'] 2022-07-26 14:24:00,051 - tempdir: /tmp/pfd_g54g5deb 2022-07-26 14:24:00,051 - CMD: sra-stat --meta --quick SRR10024973 Traceback (most recent call last): File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 116, in get_spot_count total += int(l.split('|')[2].split(':')[0]) IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 181, in main() File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 175, in main pfd(args, si, extra_args) File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 49, in pfd n_spots = get_spot_count(srr_id) File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 122, in get_spot_count raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt))) IndexError: sra-stat output parsing error! --sra-stat STDOUT--

--sra-stat STDERR-- 2022-07-26T21:24:00 sra-stat.2.8.0 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed ) 2022-07-26T21:24:00 sra-stat.2.8.0 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA ) 2022-07-26T21:24:00 sra-stat.2.8.0 err: no error - error with http open 'https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR10024973/SRR10024973' 2022-07-26T21:24:01 sra-stat.2.8.0 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed ) 2022-07-26T21:24:01 sra-stat.2.8.0 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA ) 2022-07-26T21:24:01 sra-stat.2.8.0 err: no error - error with http open 'https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR10024973/SRR10024973' 2022-07-26T21:24:01 sra-stat.2.8.0 int: connection failed while opening file within cryptographic module - 'SRR10024973'

rvalieris commented 1 year ago

looks like you are using sratools version 2.8.0, you need to update to a more recent version.

I tested with sratools 2.11.0 and it worked.

guandailu commented 1 year ago

I install the software using conda, so how i can update this within conda env?

rvalieris commented 1 year ago

with the env activated, try: conda install 'sra-tools>=2.11.0'

guandailu commented 1 year ago

Finally, "conda install -c bioconda sra-tools=2.10" works.

hyjforesight commented 1 year ago

hello @guandailu @rvalieris I installed sratools 2.10, but the errors still continue. Could you please help me with this issue? Thanks!

# install parallel-fastq-dump and sra-tools v2.10
conda config --add channels bioconda
conda install parallel-fastq-dump
conda install -c bioconda sra-tools=2.10

parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-07 12:21:28,266 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-07 12:21:28,266 - extra args: ['--split-files', '--gzip']
2022-09-07 12:21:28,270 - tempdir: /tmp/pfd_hjas65p0
2022-09-07 12:21:28,270 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-07T17:21:29 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'
guandailu commented 1 year ago

My installation steps are: conda install -c bioconda parallel-fastq-dump -n parallel-fastq-dump -m conda install -c bioconda sra-tools=2.10 -n parallel-fastq-dump

To use it: conda activate parallel-fastq-dump parallel-fastq-dump -h

rvalieris commented 1 year ago

this is a dbGap controlled file, you need permisson to download it .

if you already have the access setup, you need to go inside the directory configured in vdb-config and execute inside there like this, for example:

cd /mnt/d/HYJ/dbGap/sra/
parallel-fastq-dump --sra-id SRR15652839  --threads 16 --outdir out --split-files --gzip
hyjforesight commented 1 year ago

hello @rvalieris Thanks for the response. Yes, this is dbGap-controlled data and we have access to download all. The weird thing is that, we downloaded 453 files of them (total 456) and succeeded in converting them to fastq by parallel-fastq-dump v0.6.7 with sratools v2.8.0 (internally installed by parallel-fastq-dump)

parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRRxxxx.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip

However, these 3 SRA files (SRR15652839, SRR15653095, SRR15653115) cannot be downloaded until dbGap team reloaded them in last week. And then we used the same coding, but met the errors:

parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-07 12:21:28,266 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-07 12:21:28,266 - extra args: ['--split-files', '--gzip']
2022-09-07 12:21:28,270 - tempdir: /tmp/pfd_hjas65p0
2022-09-07 12:21:28,270 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-07T17:21:29 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

I followed your way, go inside the directory I configured, but still cannot convert it:

hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ parallel-fastq-dump --sra-id SRR15652839 --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-08 10:47:18,820 - SRR ids: ['SRR15652839']
2022-09-08 10:47:18,820 - extra args: ['--split-files', '--gzip']
2022-09-08 10:47:18,825 - tempdir: /tmp/pfd_uk3ma0fl
2022-09-08 10:47:18,825 - CMD: sra-stat --meta --quick SRR15652839
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-08T15:47:20 sra-stat.2.10.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR15652839' - Access denied - please request permission to access phs002407 / GRU in dbGaP. ( 403 )
2022-09-08T15:47:20 sra-stat.2.10.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR15652839' - Access denied - please request permission to access phs002407 / GRU in dbGaP. ( 403 )
2022-09-08T15:47:20 sra-stat.2.10.0 int: directory not found while opening manager within virtual file system module - 'SRR15652839'
hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-08 10:48:03,947 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-08 10:48:03,947 - extra args: ['--split-files', '--gzip']
2022-09-08 10:48:03,952 - tempdir: /tmp/pfd_au0bpbmv
2022-09-08 10:48:03,952 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-08T15:48:04 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

I think that the SRA team might do something on the SRA files which makes parallel-fastq-dump only work for the old ones instead of the new ones. Is is possible to solve this issue? Thanks! Best, YJ

rvalieris commented 1 year ago

I see, try to run this command to see what happens: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra

this should return a table with the number of reads/spot, parallel-fastq-dump uses this to know how many reads per thread to use, but this error: IndexError: list index out of range indicates the output is not what was expected.

hyjforesight commented 1 year ago

thanks for the quick response, @rvalieris Please see the results

hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
2022-09-08T16:13:11 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

Here also attaches a positive control that I can convert it to fastq by parallel-fastq-dump

hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR11770344.sra
/mnt/d/HYJ/dbGap/sra/SRR11770344.sra||153886278:19081898472:19081898472|:|:|:

Thanks!

rvalieris commented 1 year ago

I think this is could be due to a change on sra-tools 2.10.0, maybe this will help: https://github.com/ncbi/sra-tools/wiki/First-help-on-decryption-dbGaP-data

hyjforesight commented 1 year ago

hello @rvalieris , thanks for the information. I tried that way in cmd of Windows. It didn't work, either. I'm sending emails to SRA team for this issue.

C:\Users\Park_Lab\Downloads\sratoolkit.3.0.0-win64\bin>fasterq-dump --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc D:\HYJ\dbGap\sra\SRR15653115.sra
2022-09-08T17:23:42 fasterq-dump.3.0.0 err: libs/vfs/names4-response.c:2273:Response4StatusInit: error unexpected while resolving query within virtual file system module - No accession to process ( 500 )
Failed to call external services.

I think that the SRA team changes the encryption algorithm for SRA files which makes parallel-fastq-dump only work for the old encrypted SRA files. That's also why they introduce sra-tools v3.0. Is there any plan to upgrade the parallel-fastq-dump? Thanks! Best, YJ

rvalieris commented 1 year ago

error unexpected while resolving query within virtual file system module - No accession to process try to use just the SRR id instead of the path.

you could try also the previous parallel-fastq-dump cmdline, but add the --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc argument.

if none of this works then contacting sra team seems like the best idea, parallel-fastq-dump is using sra-tools internally as well, if you can get fastq-dump/ fasterq-dump to work parallel-fastq-dump should work too.

hyjforesight commented 1 year ago

hello @rvalieris SRA team told me some current SRA files don't support sra-tool kit < 3.0 now. That's why parallel-fastq-dump doesn't work. I run below coding, and it works.

fasterq-dump --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc SRR15652839 SRR15653095 SRR15653115 --threads 16 --outdir D:\HYJ\dbGap\sra\ --split-files --include-technical

Thanks! Best, YJ