refresh-bio / KMC

Fast and frugal disk based k-mer counter
256 stars 73 forks source link

OpenForRA method with optional minimum and maximum parameters #129

Closed tbenavi1 closed 4 years ago

tbenavi1 commented 4 years ago

Hello,

For the OpenForRA method, is it possible to only load those kmers with counters between a certain minimum and maximum counter value?

Unless I am wrong, it seems like the minimum and maximum values can only be set once the entire kmc files are loaded. This however would lead to much more memory being used than necessary.

Thanks for any assistance.

marekkokot commented 4 years ago

Hi,

thanks for using KMC.

you are right. For now, there is no option to filter out k-mers during KMC database loading. However, you may filter k-mers first, before even using kmc API by filtering k-mers with kmc_tools. kmc_tools documentation is here (check out page 8, operation group transform, operation reduce.

In general filtering during loading would be probably faster, but I think my advice should be sufficient. If it is not then let me know and we will consider adding filtering during loading.

best, Marek

tbenavi1 commented 4 years ago

Yes,

This should work for now! Thank you.