ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Lower bounding w_min to 1 instead of 0? #416

Open ksahlin opened 2 months ago

ksahlin commented 2 months ago

For logging and coming back to later:

Currently, we have randstrobe(l, u, q, max_dist, std::max(0, k / (k - s + 1) + l), k / (k - s + 1) + u).

This is is primarily for our parameter optimization, but if parameters is chosen such that k / (k - s + 1) + l <= 0, then w_min is set to 0. Because of our link function std::bitset<64> b = (strobe1.hash ^ syncmers[i].hash) & q;, w_min= 0 will deterministically pick strobe1=strobe2 and thus effectively emulate k-mers. For such a setting and some read lengths, I observed a significant drop in map rate (over 5%), increased runtime, and often but not always a decrease in accuracy. This may have been the problem with our initial parameter optimization?

With mcs, picking k-mers (w_min=0) should be strictly worse than any other parameters, since we get the k-mers for free.