suchapalaver / krust

Bioinformatics 101 tool for counting unique k-length substrings in DNA
MIT License
30 stars 5 forks source link

needletail other then bio to accelerate fasta parsing #15

Closed jianshu93 closed 1 year ago

jianshu93 commented 2 years ago

Hello Team,

It seems needle tail is much faster than bio for fasta file parsing. For larger fasta files, parsing can also be parallelized. Is this doable?

Thanks,

Jianshu

suchapalaver commented 2 years ago

Sorry for the delayed reply. I'll look into this. Thanks for the suggestion.

suchapalaver commented 1 year ago

@jianshu93 Check out this branch for experimenting with needletail.

suchapalaver commented 1 year ago

@jianshu93 , I'd be interested if you think this is an improvement. I couldn't easily figure out how to parallelize processing the fasta records, what with needletail's implementation of next().