Open lucaparmigiani opened 2 years ago
Hello,
thank you for using KMC. I need to implement handling non-counters in kmc_tools... I'm afraid it may take some time due to other responsibilities.
For now, I have some tricky workaround. You may run kmc
without -cs1
and having k-mer database set all counts to 1 with kmc_tools
(I will not cause not storing counters in the database, so kmc_tools
will not refuse it). Here are example command lines:
kmc -fm -ci1 1.fa o1 .
kmc -fm -ci1 2.fa o2 .
kmc_tools transform o1 set_counts 1 o1_1
kmc_tools transform o2 set_counts 1 o2_1
kmc_tools simple o1_1 o2_1 union o3
Let me know if it works for you. I wonder how it will hurt the performance of your pipeline. If it will be too much let me know and maybe I will be able to prioritize this issue. Another possible (although not recommended) walkaround is to use some older version of KMC).
Please do not close this issue even if it works. I will remind me that it needs to be implemented correctly.
Best, Marek
Thanks for the quick response!
Since I needed this for my experiments I modified a version of yak-count.c and use that for the moment. Thanks for the solution you suggested, I double checked my results with your method so that I was certain I was getting the right numbers.
This "histogram of genome occurence" is function that I definitely would love to see in KMC at some point.
Thank you, Luca
Hi, first of all, thanks for the tool, it is indeed pretty fast and efficient!
I wanted to ask you how to achieve the following:
I have
N
genomes (assembly). I want to get an histogram of how many kmers appear in 1 Genome, 2 genomes...N genomes. So far I cannot do this, even with kmc_tools, since producing an output with-cs1
will make it not readable for kmc_tools.If I use
-cs
> 1 then I cannot apply theunion
with-ocsum
on the two genomes (If I have a kmer appearing twice in the first genome but 0 times in the second genome this will result as if it is present in both).