wdecoster / chopper

MIT License
135 stars 11 forks source link

enhance -i performace #32

Closed JMencius closed 3 months ago

JMencius commented 3 months ago

Hi @wdecoster In the last version I submitted, a significant drop of performance using -i or --input for .gz file is observed. I did some modifications to the code to enhance the --input performace, breifly:

  1. Use different version of flate2 to achieve the best performance as mentioned in https://github.com/rust-lang/flate2-rs#Backends
  2. Add a 512 k buf. The performance is shown below:
Data File size command Version Run time
DM.fastq.gz 21G gunzip -c DM.fastq.gz | ./chopper -q 10 -l 500 > test.fastq Old version (0.8.0) 658 s
DM.fastq.gz 21G ./chopper -i DM.fastq.gz -q 10 -l 500 > test.fastq Old version (0.8.0) 3060 s
DM.fastq.gz 21G ./chopper -i DM.fastq.gz -q 10 -l 500 > test.fastq Current pull request version 759 s

Which is still worse than system-level gunzip, but close.

wdecoster commented 3 months ago

Awesome!