Closed syncmers with lower density

Thanks, Daniel, very interesting work! This will be interesting to try.

Context to Daniel's post:

A closed syncmer is a k-mer sampled when the first or last s-mer is the smallest in the window. We currently sample a syncmer when the middle s-mer is the smallest (open syncmer).

I have been testing closed syncmers at several times in strobealign - they never perform quite as well as open syncmers when sampling middle s-mer (we use the third s-smer when density is 1/5). This is expected because open syncmers have better spread (garanteed lower distance bound of 3 when the density is 1/5), showed by Shaw & Yu, 2021. Many traditional closed syncmers (upper panel in Daniels plot) are sampled at distance 1 from each other (i.e., not a good spread).

However, open syncmers come at the cost of not having a window guarantee, so some regions might be sparsely sampled. Daniels' plot shows that we can possibly get both a good spread and the window guarantee to ensure that all regions have enough seeds.

ksahlin / strobealign

Closed syncmers with lower density #429