mengyao / Complete-Striped-Smith-Waterman-Library

294 stars 112 forks source link

SegFault in Banded_SW when band_width is large #49

Closed bnbowman closed 2 years ago

bnbowman commented 7 years ago

When aligning sequences of dissimilar size, I'm occasionally getting a SegFault deep within "banded_sw". What appears to be happening is that the initial boundary setting is finding a partially garbage (long indel-containing) alignment in the larger sequence, creating a large size discrepancy between sequences and large value for band_width going into banded_sw, which breaks something deep in the aligner: RefEnd,RefStart+1,RefLen,ReadEnd,ReadStart+1,ReadLen,BandWidth 759,1,760,5332,1483,3851,3092

Sample sequences are attached below. The alignment params used are: Match=2; Mismatch=5; GapInit=3; GapExtend=3

problem_seqs.zip

mengyao commented 7 years ago

Dear Brett,

Thank you very much for sending me the error with data.

Currently, this SSW sometimes runs into problems when the penalties for GapInit <= GapExtend. Biologically meaningful parameters should have GapInit > GapExtend. Hope replacing GapInit=3 with GapInit=4 can work for you.

I will look into the error anyway later, but it may take a long time.

Yours,

Mengyao

On Tue, May 16, 2017 at 7:36 PM, Brett Bowman (@BioInfoBrett) < notifications@github.com> wrote:

When aligning sequences of dissimilar size, I'm occasionally getting a SegFault deep within "banded_sw". What appears to be happening is that the initial boundary setting is finding a partially garbage (long indel-containing) alignment in the larger sequence, creating a large size discrepancy between sequences and large value for band_width going into banded_sw, which breaks something deep in the aligner: RefEnd,RefStart+1,RefLen,ReadEnd,ReadStart+1,ReadLen,BandWidth 759,1,760,5332,1483,3851,3092

Sample sequences are attached below. The alignment params used are: Match=2; Mismatch=5; GapInit=3; GapExtend=3

problem_seqs.zip https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/files/1006117/problem_seqs.zip

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/issues/49, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVdLkbBPZDQsU81bl18IVhwEotW07vks5r6jMCgaJpZM4NdNEr .

corwinjoy commented 6 years ago

Hi, I had the same problem. Looking into it, the loop for banded_sw sometimes cannot achieve the maximum score so it just keeps doubling the bandwidth size indefinitely until the program crashes. I'm not sure what the best fix is here but I simply bounded the loop as follows: The original loop in banded_sw looks like do { .. } while (LIKELY(max < score);

I changed this to do { } while (LIKELY(max < score) && LIKELY(band_width <= readLen));

As I understand it, band_width > readLen doesn't make sense anyway so I think this is a reasonable fix.

Corwin

sammysheep commented 4 years ago

@corwinjoy Your suggestion also fixed my issue for when I assigned too small a (0) gap open penalty for long read / long reference.

Is there a scenario where the band_width might possibly need to be larger, perhaps with some sort of indel in the read?

corwinjoy commented 4 years ago

I guess that is a good point. Your read might have dropped a number of bases so potentially the maximum band width might be bigger than your read. I would still apply this same fix and set the max to your readLen + some tolerance depending on the source of your reads.