mbhall88 / rasusa

Randomly subsample sequencing reads or alignments
https://doi.org/10.21105/joss.03941
MIT License
203 stars 17 forks source link

Suggestion: replace needletail by noodles and niffler #25

Open natir opened 3 years ago

natir commented 3 years ago

Hello very nice work.

Needletail is very nice crate, but if I didn't made any mistake you use it only for fastx parsing, you didn't use any other functionality.

Noodles is crate provide many bioinformatics parser and system of functionality to get only what you need. By switch to noodles you can reduce the number of dependency of rasusa and speedup compilation time. I didn't made a full benchmark but noodles and needletail have almost same code.

If you want keep similar functionality (support compression) you need also add niffler, niffler provide a simple and transparent support for compressed files.

Again very nice work.

mbhall88 commented 3 years ago

Hey @natir.

Yes, I only just switched the parsing to needletail. Given I only just switched the parser I probably won't get around to switching it again anytime soon. Also, compile time isn't a major concern for me, especially since I distribute pre-compiled binaries and a bunch of other methods that mean users don't need to compile the project. I'm happy to review a PR with updated benchmark though.

Regarding niffler, you've made me realise somewhere along the line I have lost the compressed output functionality of this tool... Originally rasusa would infer the desired output compression from the path. I'll have to fix that.