rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
275 stars 33 forks source link

IndexError: list index out of range #25

Closed Maarten-vd-Sande closed 4 years ago

Maarten-vd-Sande commented 4 years ago

You can download the SRA here: https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos1/sra-pub-run-2/SRR2778062/SRR2778062.1

And when I dump with 8 cores it fails, normal fastq-dump performs fine

Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Read 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
Written 6432566 spots for /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062
2020-01-09T14:29:16 sra-stat.2.10.0 int: path incorrect while opening manager within database module - '/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/tmp'
SRR ids: ['/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062', '/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/tmp']
extra args: ['--split-spot', '--skip-technical', '--dumpbase', '--readids', '--clip', '--read-filter', 'pass', '--defline-seq', '@$ac.$si.$sg/$ri', '--defline-qual', '+', '--gzip']
tempdir: /tmp/pfd_7z0gpkz8
/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/SRR2778062 spots: 51460528
blocks: [[1, 6432566], [6432567, 12865132], [12865133, 19297698], [19297699, 25730264], [25730265, 32162830], [32162831, 38595396], [38595397, 45027962], [45027963, 51460528]]
tempdir: /tmp/pfd_l9hikqiw
Traceback (most recent call last):
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/bin/parallel-fastq-dump", line 4, in <module>
    __import__('pkg_resources').run_script('parallel-fastq-dump==0.6.5', 'parallel-fastq-dump')
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/lib/python3.8/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1469, in run_script
    exec(script_code, namespace, namespace)
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/lib/python3.8/site-packages/parallel_fastq_dump-0.6.5-py3.7.egg/EGG-INFO/scripts/parallel-fastq-dump", line 112, in <module>
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/lib/python3.8/site-packages/parallel_fastq_dump-0.6.5-py3.7.egg/EGG-INFO/scripts/parallel-fastq-dump", line 105, in main
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/lib/python3.8/site-packages/parallel_fastq_dump-0.6.5-py3.7.egg/EGG-INFO/scripts/parallel-fastq-dump", line 15, in pfd
  File "/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/.snakemake/conda/f4ec0168/lib/python3.8/site-packages/parallel_fastq_dump-0.6.5-py3.7.egg/EGG-INFO/scripts/parallel-fastq-dump", line 64, in get_spot_count
IndexError: list index out of range

Crash happens on this line: https://github.com/rvalieris/parallel-fastq-dump/blob/fcddfa058f8e8f6ab6adffd33023d42c41205bb2/parallel-fastq-dump#L64

rvalieris commented 4 years ago

Hello,

please include the full command line you used.

rvalieris commented 4 years ago

I can't reproduce the error with these arguments: parallel-fastq-dump --split-files --gzip -t 8 -s ~/tmp/SRR2778062.1

I suspect the arguments you used were wrong because of this line: 2020-01-09T14:29:16 sra-stat.2.10.0 int: path incorrect while opening manager within database module - '/home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/tmp'

to pass a tmp directory you need to use --tmpdir

Maarten-vd-Sande commented 4 years ago

Thanks for the reply, the full command is: parallel-fastq-dump -s /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/* -O /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/sra/SRR2778062/tmp --split-spot --skip-technical --dumpbase --readids --clip --read-filter pass --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' --threads 8 --gzip >> /home/sande/Dropbox/Studie/PhD/snakemake-workflows/workflows/download_fastq/log/sra2fastq_SE/SRR2778062.log 2>&1

(I use split spot)

It is part of a pipeline which succesfully downloaded over 300 samples already, so it would be surprising if tmpdir causes the issue. I'll try again tomorrow morning with --tmpdir and will let you know if the problem disappears.

Maarten-vd-Sande commented 4 years ago

So this is really weird, I can't really get this to crash consistently (but it does sometimes crash) when I try it outside of the pipeline, but it always crashes inside of it. However when I do not write the output to a folder inside the folder of where our input is it seems to not crash anymore, so I guess the issue lies somewhere in that I glob all the files from this folder as input?

Anyways thanks for your reply!

rvalieris commented 4 years ago

yes the glob jumped out to me as well, the tmp dir is inside the directory you are using as glob, I think this is the problem. if you want to glob a directory of SRA files make sure there isn't anything else there.

Maarten-vd-Sande commented 4 years ago

Yep that's how I "solved" it now, however still weird that it worked for tons of other samples I've run it on before.

Thanks again!