ncbi / sra-human-scrubber

An SRA tool that takes as input local fastq file from a clinical infection sample, identifies and removes any significant human read, and outputs the edited (cleaned) fastq file that can safely be used for SRA submission.
Other
45 stars 6 forks source link

Out of space (/tmp) during fastq_to_fasta.py #22

Closed accopeland closed 1 year ago

accopeland commented 1 year ago

Call it a misconfiguration if you like, but I ran out of "disk" attempting to run scrub.sh on a modest (22G) fastq file due to my /tmp directory being 63G on a shared system.

Debugging this issue I found that while scrub.sh requires 'fastq' as input, internally the scrub wrapper calls a python script to convert fastq to fasta by writing a file to /tmp. I suspect this is likely to be a performance issue going forward both for speed and space when processing fastq files produced by modern sequencers.

An option to specify TMP would be a reasonable workaround for the disk issue. If you feel the conversion to fasta is essential, then a faster method would be welcome.

Also, with a bash driver containing pipes, set -eou pipefail could be a good idea.

multikengineer commented 1 year ago

Apologies for the problem you encountered. I added pipefail flag as you rightly suggested. As you probably know since the script uses mktemp you can set $TMPDIR to whatever you want and mktemp will use that before defaulting to /tmp. Hope it helps.