ncbi / sra-tools

SRA Tools
Other
1.07k stars 243 forks source link

question about fastq-dump #910

Closed Flu09 closed 3 months ago

Flu09 commented 4 months ago

hello i use fastq-dump to get R1 and R2 files original format after prefetch.

fastq-dump --split-files -F

I have 3 questions: the splitting by fastq-dump takes so much time. is there an argument for the number of threads or ram to make it faster? what is the ram and cpu requirments/default if I use slurm ? what is the equivalent for this code if I will use fasterq-dump?

wraetz commented 4 months ago

fastq-dump is an older tool, we are replacing it with fasterq-dump. You can get it from our download-site: [https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/] There are some differences regarding commandline-arguments between fastq-dump and fasterq-dump. For instance the split-mode of fasterq-dump defaults to split-3. With the newer tool you can set the number of threads used, but we have observed that more than 8 threads does not speed up the conversion further. ( yes even on machines with 96 and more cores ) it defaults to 6 threads. Regarding RAM usage: that depends on the accession you are using. This has the most influence and you can do nothing to about it. ( even if fasterq-dump has some commandline-parameters regarding memory usage, the influence is minimal ) Read the output of "fasterq-dump -h". At the bottom are some links for more information. The biggest difference to the older tool is that fasterq-dump does create temporary files to speed up the conversion. You will need additional space on your hard-drive.