refresh-bio / KMC

Fast and frugal disk based k-mer counter
277 stars 72 forks source link

Missing k-mer bug #180

Closed jamshed closed 2 years ago

jamshed commented 2 years ago

Hello @marekkokot, reporting another bug!

I'm using the latest commit 13b9b04120e902d158bd0cb87d83b63b742781b9. Consider the following specific FASTA file:

>palindrome
AACTGACATGTCAGTT

Executing the following two commands produce the k-mers given below.

kmc -v -k5 -fa -ci1 -t1 palindrome.fa bug-report.kmc .
kmc_dump bug-report.kmc /dev/stdout
AACTG   2
ACATG   2
ACTGA   2
ATGTC   2
CTGAC   2
AGACA   2

Note that, the canonical k-mer TGACA is missing in the output, that corresponds to two k-mer instances from the input—TGACA and TGTCA. Besides, in the output, there is a canonical k-mer AGACA reported to be present twice, which has none of its two instances (AGACA and TGTCT) present in the input.

marekkokot commented 2 years ago

Thank you for reporting this issue! It has been fixed with 2fabec79031757afe4adfb09409b74c3d026cab7