Closed voichek closed 5 years ago
Hi,
This seems to be a special case of inline void to_long(std::vector<uint64>& kmer)
method with some edge cases eliminated, so in my opinion, this is quite nice workaround :).
If its performance is still not enough for your purposes, you may try to define your own class to represent k-mers and read the database. It is probably a quite time-consuming task, but if you want to, than read the kmc database format describtion in the docs. As your case (k<32) is simpler maybe you will be able to reduce some more edge cases. The priority of KMC API is its flexibility and ease of use, the performance is the second criterion. In general, I don't think you may gain a huge performance boost creating your own implementation to access KMC DB, but some boost should be possible.
If you describe your use case more in detail, I may try to help you.
Do you access the database in the random access mode or listing mode? In the first case, you may try, for example, sort the database using kmc_tools
, because in some cases querying k-mers may work faster if the database is sorted.
Best, Marek
Dear Merek,
Thank you for your response.
The current (_CKmerAPIupto31bp) implementation provides me with the performance I need. I was worried that the workaround might be incorrect in some cases and wanted to make sure I am not missing anything.
Thanks again, Yoav
Hi,
I am using KMC-API in my code to load kmers DBs create by KMC. I am using kmers of length <= 31, and I want to get the kmers in bit representation.
I can use _get_numsymbol, but due to efficiency consideration I would like to get the k-mer in the minimum number of operations.
I thought of this workaround, and I wanted to get feedback if it is correct and if there is other more natural way to do the same thing:
Thanks for the help, Yoav Voichek,