Closed antonkulaga closed 5 years ago
hello,
yes downloading stuff with fastq-dump is slow. thats why I recommend in the README to use prefetch
to download the data first and then do the dumping with fastq-dump.
parallel-fastq-dump is just a wrapper that parallelizes fastq-dump, however downloading like this is still slow.
also note that controlled data from dbGaP cannot be downloaded from the public ftp archive, so you have to use sra-tools for those.
Dear Renan,
I quickly want to take the opportunity to thank you for this very nice and convenient wrapper. Using a 72-core node, it enabled me to convert about 5TB of SRA files from a WGS cohort in less than a day. Even the newer fasterq.dump is no replacement for your wrapper. Thank you very much!
thanks for the kind words Alexander, you are welcome ! I am glad to know the tool is being useful.
I download stuff with wget ten times faster, as an example wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra takes 60 secondswith 2 mins of follow up extraction while parallel-fastqdump spends half an hour with 4 threads