ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Optimized parameters for read length 50 decrease accuracy for read length 75 #395

Closed marcelm closed 4 months ago

marcelm commented 4 months ago

Switching from (20, 16, -3, 2) to (18, 14, -2, 1) improved accuracy for read length 50, but had the unintended sideeffect of reducing it for read length 75, which is mapped to canonical read length 50 and therefore uses the same parameter settings.

The data to see this was already available in this table, which says that (20, 16, -3, 2) is optimal for read length 75.

Do we need to add canonical read length 75?

ksahlin commented 4 months ago

Given email discussion, the improvement looks significant enough to consider a new canonical length around 75bp. I would vote yes, as I don't see any downside with it.

marcelm commented 4 months ago

The only downside I may be able to see is additional disk space for those who want to store indices for all possible read lengths on disk, but since we’ve optimized index creation quite a bit, that use case is less and less relevant.

ksahlin commented 4 months ago

Yes, I agree that it's a minor cost in comparison.