Open adamnovak opened 1 year ago
Yeah, GSSW has 8-bit and 16-bit scoring modes: https://github.com/vgteam/gssw/blame/14b4d43736bb606c3fc97c4724d1959d13550d37/src/gssw.h#L28
I think it just can't deal with sequences this long.
We need logic in Giraffe that will ball out on or fall back from the fallback alignment code path if the sequence is more than... 15 kbp? 30 kbp?
We could do something like in the BandedGlobalAligner wrapper in Aligner, where it chooses based on the maximum score. I'm not sure what the fallback would be though, we don't really have an alignment module that can do arbitrary POA with that large of integers.
I tried this problem with gfatools ed
and I got like a 26000 edit distance in a couple of minutes. I should try the reverse orientation for the sequence to see if that is plausibly fast.
Grab the files in problem.tar.gz and run
command.sh
which does:seq.txt is 58122 bp of sequence, and the graph is a bit more graph than that. I'm not entirely sure the sequence is in the right orientation here (I think it's supposed to really have its end base pinned to the reverse strand of node 28436835 in the internal Giraffe operation I am trying to match), but regardless I've managed to trigger the bug:
My first guess is that GSSW is overflowing because the alignment is about as long as a 16-bit int can count.