jeffdaily / parasail-python

Python bindings for the parasail C library.
Other
87 stars 17 forks source link

Weird traceback/cigar #22

Closed ksahlin closed 6 years ago

ksahlin commented 6 years ago

Hi, I'm observing strange cigars on some of my alignments. They usually have a long deletion at the end and matches the very last base pair. I've attached one of the examples below and have a handful more if you want additional cases.

>>> import parasail
>>> parasail.__version__
'1.1.11'
>>> user_matrix = parasail.matrix_create("ACGT", 2, -3)
>>> result = parasail.nw_trace_scan_16(s1, s2, 10, 1, user_matrix)
>>> s2 = "AGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAATTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCGGTTTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTGAAGAACCAGGACGATTTAAGGATTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACAGTCAATTTCTGCTGGCTGGACTGAACTGAAGATCAATCCTCACAATTCAGACTGAGGGTTGAGACAAAACTTTAAGGATACATCTTGGACCATATCGTATTTCATTCTTCTAATGGTGGTTTGGGCTTGTCTTCTAGTCTGGGCCGCTCTAAACATTTATAATTCCAACATTGTGGATTTCATCTTATATCTGTGGACCATCCTAGTTTATTCTCCCATAAGTCTTAGAAGCTTTATGGTGATTATTTTGAGGTTTTCATTCTCGCATAAAGCACAATGCTGTCTTCATCAGAAAACAGTTGGCATAAGAATTAAACATATG"
>>> s1 = "AGCTGTTACTGCAAGTCCATGCCTCTACCATGAGGGTGGAGGAGTTAAGATCAACAGATCCACATGTACCTTGAGGTGACAGACTGGCTCTGAACAAGTTGAAATCATCGCAGAAGGATAAAGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAATTTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCGAGTTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTGAAAGAACCAGGACGATTTAAGGATTTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACAG"
>>> result = parasail.nw_trace_scan_16(s1, s2, 10, 1, user_matrix)
>>> cigar_string = str(result.cigar.decode).decode('utf-8')
>>> cigar_string
u'121I96=1I217=2X96=1I22=1I374=321D1='

I would not expect the last 321D stretch as s1 and s2 matches perfectly in the ends. Here is the result:

AGCTGTTACTGCAAGTCCATGCCTCTACCATGAGGGTGGAGGAGTTAAGATCAACAGATCCACATGTACCTTGAGGTGACAGACTGGCTCTGAACAAGTTGAAATCATCGCAGAAGGATAAAGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAATTTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCGAG-TTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTGAAAGAACCAGGACGATTTAAGGATTTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACA---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------G
-------------------------------------------------------------------------------------------------------------------------AGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAA-TTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCG-GTTTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTG-AAGAACCAGGACGATTTAAGGA-TTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACAGTCAATTTCTGCTGGCTGGACTGAACTGAAGATCAATCCTCACAATTCAGACTGAGGGTTGAGACAAAACTTTAAGGATACATCTTGGACCATATCGTATTTCATTCTTCTAATGGTGGTTTGGGCTTGTCTTCTAGTCTGGGCCGCTCTAAACATTTATAATTCCAACATTGTGGATTTCATCTTATATCTGTGGACCATCCTAGTTTATTCTCCCATAAGTCTTAGAAGCTTTATGGTGATTATTTTGAGGTTTTCATTCTCGCATAAAGCACAATGCTGTCTTCATCAGAAAACAGTTGGCATAAGAATTAAACATATG

Any idea what causes this? Thanks for your time!

ksahlin commented 6 years ago

To help troubleshooting, I can add that when I simply reverse the strings, i.e., aligning s1[::-1] to s2[::-1], the one base pair match now has the correct placement, but the other end instead has the problem. That is, the problem seems to happen in the end of the cigar.

>>> result = parasail.nw_trace_scan_16(s1[::-1], s2[::-1], 10, 1, user_matrix)
>>> cigar_string = str(result.cigar.decode, 'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: str() takes at most 1 argument (2 given)
>>> cigar_string = str(result.cigar.decode).decode('utf-8')
>>> cigar_string
u'321D371=1I24=1I98=2X212=1I99=121I2='
ksahlin commented 6 years ago

oh, my bad! I just realized that I need to use semi-global to get the results that I want..

jeffdaily commented 6 years ago

You beat me to it, I was going to suggest semi-global. Glad you found it!

On May 23, 2018, at 5:02 PM, Kristoffer notifications@github.com<mailto:notifications@github.com> wrote:

oh, my bad! I just realized that I need to use semi-global to get the results that I want..

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/jeffdaily/parasail-python/issues/22#issuecomment-391541552, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA3MOAwIBwcsLgLHr15bpkMABP5kzVzrks5t1ficgaJpZM4ULV1w.