refresh-bio / KMC

Fast and frugal disk based k-mer counter
253 stars 73 forks source link

Producing smaller k-mers from databases with larger k-values #165

Closed jamshed closed 2 years ago

jamshed commented 3 years ago

Hello @marekkokot, thanks again for the awesome library! We have been using KMC3 for some research projects in our lab, like our tool cuttlefish (manuscript and repo) and it performs really great for our requirements. A need for a utility that arose lately—and having that would save a lot of computational work—is to produce a k-mer database from some existing (k + x)-mer database (with x > 0). This is the general objective.

To be more specific of our needs, say that we have built a (k + 1)-mer database from some huge amount of input data. We also need all the k-mers in our application, which can be produced just the same way. But instead of going over the entire input data again, it would be great if there is some procedure to generate the k-mers from the already produced (k + 1)-mers—saving a lot of computation.

I'm also tagging @rob-p here as both of us are quite curious to have support for this utility. It will be really great if you may have a look into this!

Regards.

jamshed commented 2 years ago

Support added in v3.2.0. Thanks again for the collaboration!

marekkokot commented 2 years ago

Thank you as well :-) hoping for a future collab!