mengyao / Complete-Striped-Smith-Waterman-Library

294 stars 112 forks source link

Discrepancy between C implementation and Python wrapper #71

Closed jjtapia closed 2 years ago

jjtapia commented 4 years ago

Consider the following sequences

>dummy
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>ref_dummy
AAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

With a one nucleotide difference. Running

./ssw_test target.fasta ref.fasta -c -s -h

will return a result with a cigar code of

26=1X33=

Which I believe is the correct one. Using ssw-py consider now the equivalent python code of

seq = 'A' * 60
ref_seq = 'A' * 24 + 'AAT' + 'A' * 33
from ssw import SSW
tester = SSW()
tester.setRead(seq)
tester.setReference(ref_seq)
res = tester.align()
print(res.CIGAR)

This returns a cigar code of

60M

The same will happen if you run the default python code in the repo

res = ssw_main({'bPath':True, 'bSam':True, 'query':seq, 'targetseq':ref_seq, 'sRId': 'ref', 'bHeader':True, 'targetname':'dummy_ref'})
print(res.getvalue())

Will return a cigar value of 60M, which totally misses the mismatch.