torognes / swipe

Smith-Waterman database searches with inter-sequence SIMD parallelisation
GNU Affero General Public License v3.0
58 stars 21 forks source link

no similarities of long sequences are computed #10

Closed tolot27 closed 10 years ago

tolot27 commented 10 years ago

For long protein sequences i. e. of about 6,800 aa no similarities are computed. The easiest check is to align one of the sequences to itself. I've verified it with one particular sequence and found out that it stops working exactly at 6803 aa. But it looks like to be composition or score dependent because the self alignment of another such long sequence stopped working somewhere between 6,650 and 6,700 aa.

Aligning these two sequences against each other returns an alignment.

I can provide a test file but can't attach them to this issue because of unsupported file extension.

I've tested it with the precompiled and self compiled binary under Linux (FC16) and a different server, both supporting SSSE3.

tolot27 commented 10 years ago

I've tracked down the problem to scores of size 2^15 or greater.

The problem is in search16s.cc, line 446

long score = ((WORD*)&S)[c] - 0x8000;

S is 128bit register carrying eight 16bit signed values initialized with 0x8000. These constant value is than subtracted in line 446.

In one rare case we discovered a crash of swipe related to scores greater than 2^15.

tolot27 commented 10 years ago

It works. Great! Many thanks for fixing this issue so quickly.