waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap2
MIT License
172 stars 15 forks source link

Fix freq seed count #240

Closed bkille closed 1 month ago

bkille commented 3 months ago

Same as #239, but when counting the number of intervals a particular hash/seed is sketched in, we consider an interval of length x as ceil(x/s) non-overlapping intervals (where s is the segment length). The ensures that frequent k-mers which often occur within s base-pairs of each other and therefore lead to long minmer intervals are not counted only once.