mengyao / Complete-Striped-Smith-Waterman-Library

294 stars 112 forks source link

incorrect alignment #80

Closed boristim66 closed 2 years ago

boristim66 commented 2 years ago

Dear developers! Analysis of nearly 100,000 unique hCoV-19 protein sequences from the GISAID database found an alignment shift for several closely related sequences, samples of which are attached. As you can see, these samples are initially aligned, but the program produces ref_begin1 = 1 and read_begin1 = 2, which leads to a shift in the result. I hope you can detect and correct this bug . command line: -p -c testcopy2.fa query.fasta

Sincerely, Boris ssw_example.zip /

boristim66 commented 2 years ago

Changes made according issue #61 solve the problem, but #78 is not.

jeffdaily commented 2 years ago

Looks like you found the solution. There is a bug in how SSW generates the cigar. For the first sequence in testcopy2.fa the cigar returned by SSW is M1271.

I'm the author of the similar parasail library that expands the implementation to cover sse41 and avx2 ISAs. It seems parasail handles your inputs correctly. The cigar returned by parasail is 1=1D1=1X478=1X129=1X561=1X97=.

parasail_aligner -o 3 -e 1 -m blosum50 -f testcopy2.fa -q query.fasta -O SSW -a sw_trace_striped_sse2_128_16

boristim66 commented 2 years ago

Thank you very much, @jeffdaily. You have done a great job, I am delighted. With your permission, I will use your development.

jeffdaily commented 2 years ago

Thank you very much, @jeffdaily. You have done a great job, I am delighted. With your permission, I will use your development.

No permission needed. It's open source.

mengyao commented 2 years ago

Thanks, all. This bug is fixed in the new version.