tlemane / kmtricks

modular k-mer count matrix and Bloom filter construction for large read collections
GNU Affero General Public License v3.0
72 stars 7 forks source link

False positive kmers #26

Closed jmattock5 closed 1 year ago

jmattock5 commented 1 year ago

Hi, thanks for making this great tool! I'm trying to generate kmer presence/absence matrices with these commands: kmtricks pipeline --file list_five --run-dir output_5 --cpr --mode kmer:pa:bin kmtricks aggregate --run-dir output_5 --pa-matrix kmer --format text --cpr-in --sorted > output.txt

I've noticed that, in the output, kmers are being reported as present in an assembly but when I grep for that kmer it isn't in the fasta. Are there any settings I can use to prevent this happening? I've tried --hard-min 1 which didn't help.

Thanks! Jenny

tlemane commented 1 year ago

Hi, Sorry for the late reply. In kmtricks, k-mers are always represented in canonical forms. Did you try to grep the reverse complements of the missing k-mers? Note that kmtricks computes canonical form according to this alphabet order: A < C < T < G. Teo

jmattock5 commented 1 year ago

Hi, With this info of the alphabet order kmtricks uses I was able to find the canonical kmers in my fastas, problem solved. Thanks! Jenny