Closed natir closed 1 day ago
Hey @natir.
Yes, I only just switched the parsing to needletail. Given I only just switched the parser I probably won't get around to switching it again anytime soon. Also, compile time isn't a major concern for me, especially since I distribute pre-compiled binaries and a bunch of other methods that mean users don't need to compile the project. I'm happy to review a PR with updated benchmark though.
Regarding niffler, you've made me realise somewhere along the line I have lost the compressed output functionality of this tool... Originally rasusa would infer the desired output compression from the path. I'll have to fix that.
I have always been interested in what the speed difference is between these two. So I did a small benchmark
tl;dr needletail is faster
Benchmarking needletail FASTQ parsing: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 49.7s, or reduce sample count to 10.
needletail FASTQ parsing
time: [2.4155 s 2.4311 s 2.4522 s]
change: [-0.2443% +0.4751% +1.3345%] (p = 0.18 > 0.05)
No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
2 (10.00%) high mild
1 (5.00%) high severe
Benchmarking noodles-fastq FASTQ parsing: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 157.0s, or reduce sample count to 10.
noodles-fastq FASTQ parsing
time: [7.6495 s 7.7057 s 7.7732 s]
Found 2 outliers among 20 measurements (10.00%)
2 (10.00%) high mild
Hello very nice work.
Needletail is very nice crate, but if I didn't made any mistake you use it only for fastx parsing, you didn't use any other functionality.
Noodles is crate provide many bioinformatics parser and system of functionality to get only what you need. By switch to noodles you can reduce the number of dependency of rasusa and speedup compilation time. I didn't made a full benchmark but noodles and needletail have almost same code.
If you want keep similar functionality (support compression) you need also add niffler, niffler provide a simple and transparent support for compressed files.
Again very nice work.