Open rhysnewell opened 2 years ago
For both the C and rust code, the main reason for this constraint is the size of the backtrack array, which is a two dimensional array of i16
. I think that changing this to i32
would roughly double the memory usage and maybe cause some slowdown, but I don't know how much without measuring.
In addition to replacing i16
with i32
, you would need to change the AVX instructions that do the packing (_mm256_packs_epi32
). I'm reluctant to change the C code to do this, since then we stop being the same as upstream GKL, which is the main reason for having that C code there. However, it would be feasible to change the rust code and see how it affects performance.
Hmm, I'm not sure I have that right... let me refresh my memory of how this works.
I think the backtrack array only contains the insert/delete flags, so it could even be u8
. I added assert!(btrack <= 15);
here and it was never hit during the tests:
https://github.com/philipc/gkl-rs/blob/9eec44205839fc9e77a4acbe3740e0dab92254af/src/smithwaterman.rs#L523-L524
So I don't think the backtrack array is the reason for the limit.
Instead, I'd be concerned about overflows in the 32-bit calculations in compute
, but I'd have to look closer to determine where they might occur.
I think the only place that overflow can occur is here: https://github.com/philipc/gkl-rs/blob/9eec44205839fc9e77a4acbe3740e0dab92254af/src/smithwaterman.rs#L350
Underflow isn't a concern because we always take the maximum of two values, so it is limited by low_init_value
.
So the limit is that MAX_SW_SEQUENCE_LENGTH * MAXIMUM_SW_MATCH_VALUE
< i32::MAX
. Currently the code limits these values individually, but I don't see why we couldn't limit the product instead.
Hi Philip,
Just playing around with large SW alignments and have noticed that the maximum alignment length is capped due to i16 constraints: https://github.com/philipc/gkl-rs/blob/9eec44205839fc9e77a4acbe3740e0dab92254af/src/smithwaterman.rs#L24 How feasible would it be to remove this constraint to allow for larger alignments? Doesn't seem like allowing larger integers would impact the C side of GKL too much, but I'm not sure if it would break other components
Cheers, Rhys