Closed marcelm closed 4 months ago
Given email discussion, the improvement looks significant enough to consider a new canonical length around 75bp. I would vote yes, as I don't see any downside with it.
The only downside I may be able to see is additional disk space for those who want to store indices for all possible read lengths on disk, but since we’ve optimized index creation quite a bit, that use case is less and less relevant.
Yes, I agree that it's a minor cost in comparison.
Switching from (20, 16, -3, 2) to (18, 14, -2, 1) improved accuracy for read length 50, but had the unintended sideeffect of reducing it for read length 75, which is mapped to canonical read length 50 and therefore uses the same parameter settings.
The data to see this was already available in this table, which says that (20, 16, -3, 2) is optimal for read length 75.
Do we need to add canonical read length 75?