Currently, we have randstrobe(l, u, q, max_dist, std::max(0, k / (k - s + 1) + l), k / (k - s + 1) + u).
This is is primarily for our parameter optimization, but if parameters is chosen such that k / (k - s + 1) + l <= 0, then w_min is set to 0. Because of our link function std::bitset<64> b = (strobe1.hash ^ syncmers[i].hash) & q;, w_min= 0 will deterministically pick strobe1=strobe2 and thus effectively emulate k-mers. For such a setting and some read lengths, I observed a significant drop in map rate (over 5%), increased runtime, and often but not always a decrease in accuracy. This may have been the problem with our initial parameter optimization?
With mcs, picking k-mers (w_min=0) should be strictly worse than any other parameters, since we get the k-mers for free.
For logging and coming back to later:
Currently, we have
randstrobe(l, u, q, max_dist, std::max(0, k / (k - s + 1) + l), k / (k - s + 1) + u)
.This is is primarily for our parameter optimization, but if parameters is chosen such that
k / (k - s + 1) + l <= 0
, then w_min is set to 0. Because of our link functionstd::bitset<64> b = (strobe1.hash ^ syncmers[i].hash) & q;
,w_min= 0
will deterministically pick strobe1=strobe2 and thus effectively emulate k-mers. For such a setting and some read lengths, I observed a significant drop in map rate (over 5%), increased runtime, and often but not always a decrease in accuracy. This may have been the problem with our initial parameter optimization?With mcs, picking k-mers (w_min=0) should be strictly worse than any other parameters, since we get the k-mers for free.