rvalieris / parallel-fastq-dump

parallel fastq-dump wrapper
MIT License
265 stars 33 forks source link

superslow #18

Closed antonkulaga closed 5 years ago

antonkulaga commented 5 years ago

I download stuff with wget ten times faster, as an example wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra takes 60 secondswith 2 mins of follow up extraction while parallel-fastqdump spends half an hour with 4 threads

rvalieris commented 5 years ago

hello, yes downloading stuff with fastq-dump is slow. thats why I recommend in the README to use prefetch to download the data first and then do the dumping with fastq-dump.

parallel-fastq-dump is just a wrapper that parallelizes fastq-dump, however downloading like this is still slow.

also note that controlled data from dbGaP cannot be downloaded from the public ftp archive, so you have to use sra-tools for those.

ATpoint commented 5 years ago

Dear Renan,

I quickly want to take the opportunity to thank you for this very nice and convenient wrapper. Using a 72-core node, it enabled me to convert about 5TB of SRA files from a WGS cohort in less than a day. Even the newer fasterq.dump is no replacement for your wrapper. Thank you very much!

rvalieris commented 5 years ago

thanks for the kind words Alexander, you are welcome ! I am glad to know the tool is being useful.