suchapalaver / krust

Bioinformatics 101 tool for counting unique k-length substrings in DNA
MIT License
30 stars 5 forks source link

speed up by changing the utf8 processing, reverse-comp, and storage #10

Closed suchapalaver closed 3 years ago

suchapalaver commented 3 years ago

the utf8-processing of the kmers. The kmer iterator itself should really check it has valid kmers while iterating. Also, instead of storing the reverse-complement in heap-allocated strings, you can make a lazy reverse-complemented object. Alternatively, store the kmers in u64 - one of the reasons for using kmers in the first place is that they can be packed into machine integers for speed.