optimuscoprime / autoadapt

Automatic quality control for FASTQ sequencing files
GNU General Public License v2.0
13 stars 2 forks source link

splitFile(): use GNU split for ultimate memory efficiency #6

Open spock opened 8 years ago

spock commented 8 years ago

Hi,

I've been benchmarking a few adapter trimmers for my purposes, including autoadapt.

I have found that FASTQ splitting stage easily consumes 16+GB of RAM plus 6+GB of swap on 2x8.4GB FASTQ files; at this point I've killed the process. I've replaced splitFile() with a similar-behaving call to GNU split, which is fast and much more memory-efficient. The only (minor, because of caching and thus helping split) downside is that I have to first calculate how many reads there are in the input file.

The mergeFiles() stage consumed about 2.5 GB RAM on the processed files; this is too much for simple file merging, but at least it fits into a 16 GB limit that I have right now.