Closed satta closed 8 years ago
A binary built with clang 3.6.2 seems to produce different (imho better) alignments and no such problems. I wonder if it's related to compiler issues? GCC 4.9 also produces working binaries.
Here's what the output looks like in this case:
target_name: chr3
query_name: chr1
optimal_alignment_score: 291 suboptimal_alignment_score: 289 strand: + target_begin: 680970 target_end: 682002 query_begin: 9 query_end: 998
Target: 680970 TTAAATTGTCA--ACATGGTA-TATATGAAATGAATTGGGTTTATAATAGTAAACCAATC 681026
|| |||*|*|| ||||*||| ||| |*| || || ||*||||||*|| |*
Query: 9 TT-AATCGCCACGACATAGTAGTAT-TTA---GA-----GT---TACTAGTAAGCC--TG 53
Target: 681027 TTG-CATTCCTAAAA---TA--AATCCTATTTAG--TTATCGT--------GTGTTATTT 681070
*|| ||*|*| |*|| || **|*||*||||| |*||*|| ||*||| |*
Query: 54 ATGCCACTAC-ACAATTCTAGCTTTTCTCTTTAGGATGATTGTTTCATTCAGTCTTA-TC 111
Target: 681071 TCTT----AATACAGTGTTGGATTAAGTTTGCTATTTTTTATTTTAAAATTTTACATTAT 681126
|||| ||*||| |*|||**||*| || |||*||**||||| |||| ||
Query: 112 TCTTTTAGAAAACA---TAGGAAAAAAT----TA--TTTAATAATAAAA-TTTA----AT 157
Target: 681127 TTTCTTCAGTGA---TATTGACCTGT---AGTTTTCTTTCATTATACTGTTTTATCAGAT 681180
|**| **|*||| || ||*|*|*| |||*| |||| *||| ||||| |||| |
Query: 158 TGGC-AAAATGAAGGTA-TGGCTTATAAGAGTGT--TTTC-CTAT--TGTTT--TCAG-T 207
Target: 681181 TTAGGTATCAGTGTTAT-AGTTTCT---TCATGAAAGGAAT---TAAGATGTTTTCCTTC 681233
*||||**|||*||||*| |*|**|| *||***|||||*| |||*||| ||*||
Query: 208 GTAGGACTCACTGTTCTAAATAACTGGGACACCCAAGGATTCTGTAAAATG----CCATC 263
Target: 681234 C-CTT-TCA---ATGTT------CTGAATAATCATACAGCAT-TA-TGACTATCTGCCAT 681280
| *|| ||| ||*|| ||*||*|*||||*|| ||| || |*|*|*|*| |
Query: 264 CAGTTATCATTTATATTCCCTAACTCAAAATTCATTCA-CATGTATTCATTTTTT----T 318
Target: 681281 TTAAATC--TTTAGAAT----AATTACCTTGTGAAACTATGTGGGCTTGGTGCTATTTTG 681334
*|||| | *||||*|| |||| ||*||*||| || |*|||*|*|**| | ***|
Query: 319 CTAAA-CAAATTAGCATGTAGAATT--CTGGTTAAA--AT-TTGGCATAGAAC-A-CCCG 370
Target: 681335 TGTAGTATTTATC-TAATTGTTTCCTATA--TTTC-TT--CTA--TGAAATTGGTTGATT 681386
*||| | |||*|| ||| ||***||*||| |*|| || ||| |||*|*||| |||||
Query: 371 GGTA-T-TTTTTCATAA-TGCACCCAATAACTGTCATTCACTAATTGAGAATGG-TGATT 426
Target: 681387 TAATCTTTCTAACACTAATAAGATTAATTTGAGTAAACTTCATTTCTCTAGAAAATAATT 681446
||| |*||***||||||**||| || |||| ||*|*| | |*||||*| |
Query: 427 TAA-----CAAAGGATAATAAAGTTA---TG---AAAC--CAATGC-C-ACAAAACA--T 469
Target: 681447 AATTCATCAAATTAAATCAAATTTGAAATTTGATTTCAAATTTATTTGTGTCAAAGTAA- 681505
***|| || |||*|****|*|| |*|| |*| *|*|*|*|||||****||||
Query: 470 CTGTC-TC----TAACTGGTGTGTG---TGTG-TGT---GTGTGTGTGTGTGTGTGTAAG 517
Target: 681506 A-GGAG-G--------TTTC-CT--GT-CATTTTAAAATTCTC----TGTTGTTTTAATA 681547
| |||| | |||| || *| || |||| |||| |*||*|||| *|*
Query: 518 AGGGAGAGAGAGAAAATTTCACTCCCTCCA---TAAA--TCTCACAGTATTCTTTT-CTT 571
Target: 681548 GTAACTT--GTTT-CTTATTATTTATAACTTTAT---ATATT--TGTACTT---TTTCCT 681596
*|**||| *||| |||**|*|| ||||*| *|||| |*|*||| |||||
Query: 572 TTTCCTTTCCTTTCCTTGCTCTT-----CTTTCTCTCCTATTGCTTTCCTTTCATTTCC- 625
Target: 681597 TTTTTATCAAATG--AACT--GAGTTT------TAAC-----ATCGCAT--TTAA---TT 681636
||*|*|| |||*| ||*| *|*|*| |||| ||*| || |*|| ||
Query: 626 TTCTCAT-AAAAGAAAAATAACAATATAGAAAATAACAAAATATAG-ATGGTCAACCTTT 683
Target: 681637 TTAATTATCTAATCTAAATCTAAAAGCTGATATTTGATTCAA-----TTGTC-AGAAATC 681690
|||| ||| |||**| *|*|||||| ||**||| ||*||| ||*|| |||*||
Query: 684 TTAA-TAT-TAAGGT-TACCTAAAA--TGCCATT--ATCCAAAGTGGTTCTCTAGAGAT- 735
Target: 681691 TCTTAAGTATATTGGGATCAACTTGGAAAAGTAAATTGACTTTAAACAGTACACTT--TA 681748
*||*|*|||||| |||| |*| | ||| || |||||**| || *|
Query: 736 GCTGATGTATAT--------ACTT--ACA--T--ATT---TT---ACAGTGTA-TTCAAA 774
Target: 681749 TGAACTTTAAATTACAGATCAGATAT--TTTTAGTAA--AA-TTTTATCCAAACTAAGAT 681803
|*||***||*||||||*|**|*|||| |||| |||| || ||||*|| |*|||
Query: 775 TAAAGAGTATATTACATAAGACATATCCTTTT-GTAACCAACTTTTGTC---ATTAA--- 827
Target: 681804 GCAATATGAGTATAATCACTGATTCAA-AAAGCTTCCTTTAGTGAGCATGAAATATC-TC 681861
|||| |*|*|**|*| || |||| ||| || || *|*| || |||| |
Query: 828 -CAAT-TTACTGGACT---TG--TCAACAAA----CC--TA--AATC-TG---TATCGT- 867
Target: 681862 ACTATTTTAAT---T---TTAATATTGATTGTGTGTTAAAATAATAATAATTGCATTAT- 681914
||| |||| | ||*||*||| ||*|| |||**||||*|***|*||*|
Query: 868 -CTA---TAATGGCTACGTTCATTTTG---GTATG----AATCTTAATTACCCCTTTCTG 916
Target: 681915 -ATTATTGATAATGTGTATTCTTGTCATGTTTCCATTCTTACTGGAAATG--CCTCCAGT 681971
|||||| ||||| ||| ||*||||*|*| ||*|||| ||||| |*||*|*|
Query: 917 CATTATT--TAATG---ATT-TTCTCATATGT-CACTCTT-----AAATGTACTTCTAAT 964
Target: 681972 GTTTC----TCCATCAAATAGTAT-ACTGGCTCTAA 682002
*|||| |*|||||*||| || |*|||*||*||
Query: 965 TTTTCACTTTACATCACATA--ATGAATGGATCCAA 998
CPU time: 9.230925 seconds
==3987==
==3987== HEAP SUMMARY:
==3987== in use at exit: 0 bytes in 0 blocks
==3987== total heap usage: 118 allocs, 118 frees, 75,801,268 bytes allocated
==3987==
==3987== All heap blocks were freed -- no leaks are possible
==3987==
==3987== For counts of detected and suppressed errors, rerun with: -v
==3987== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
After some debugging and hacking, I was able to produce a consistent build with the correct result after the following changes: https://github.com/satta/Complete-Striped-Smith-Waterman-Library/commit/c341ce25740a60966ce50d3e074509dcef231315
Apparently the result of to_cigar_int()
varies across compilers, to be exact the result after the bitwise OR operation. This leads to almost all operations being stored as M, corrupting the alignment representation.
The fact that optimization settings and inlining are involved makes me wonder if it's a boundary case in gcc being hit?
Dear Satta,
Thank you very much for your comments. I think this is very helpful.
I was wondering would you mind to make a pull request with your changes, so that I can update the programs easily.
Please let me know.
Yours,
Mengyao
Hi Mengyao, done, please see #39. I'd like to point out that moving to_cigar_int
out of the header file may have performance implications, so you may want to test carefully. Fortunately there are no ABI breaks w.r.t. to the current version, as we're only adding a new public symbol.
Dear Sascha,
Thank you very much.
I think this function won’t influence speed much. The most time consuming part is at the matrix calculation.
Yours,
Mengyao
On Aug 19, 2016, at 2:42 PM, Sascha Steinbiss notifications@github.com wrote:
Hi Mengyao, done, please see #39 https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/pull/39. I'd like to point out that moving to_cigar_int out of the header file may have performance implications, so you may want to test carefully. Fortunately there are no ABI breaks w.r.t. to the current version, as we're only adding a new public symbol.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/issues/37#issuecomment-241101132, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVdEK6ceN6fDuBJgBxqAR8iEGPZlzSks5qhfkEgaJpZM4JPHrF.
OK, thanks for the merge; I will incorporate it as a patch into the Debian package unless you would be willing to tag a new version :)
Dear Sascha,
Thank you for making the Debian package.
I think to tag a new version is better, so I made one here: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/releases https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/releases
I was wondering would you mind to test it before updating the Debian package? This one hasn’t been used much. Thank you so much.
Please let me know, if you find any problem or have any comment.
Yours,
Mengyao
On Aug 19, 2016, at 9:28 PM, Sascha Steinbiss notifications@github.com wrote:
Closed #37 https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/issues/37.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/issues/37#event-761945362, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVdKb-74jFHHKZAKVhbh-xJT2LpiDdks5qhlgsgaJpZM4JPHrF.
Sure, I can re-run the tests I did to trace the problem. However, it would be more natural if you could assign a version number in the tag that is greater than the last one ('v1.0'). The current one ('gcc') does not clearly indicate an order... that would be nice - thanks!
Dear Sascha,
Yes, you are right. I just changed it to v1.1.
Many thanks,
Mengyao
On Aug 22, 2016, at 2:04 PM, Sascha Steinbiss notifications@github.com wrote:
Sure, I can re-run the tests I did to trace the problem. However, it would be more natural if you could assign a version number in the tag that is greater than the last one ('v1.0'). The current one ('gcc') does not clearly indicate an order... that would be nice - thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/issues/37#issuecomment-241498017, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVdOhl1wCjVDj7vCU-RD0TsaQ0keZEks5qieSwgaJpZM4JPHrF.
Looks good, the error seems to be gone now, and dependencies (like SPAdes) still pass their tests. I have updated the Debian package now, thanks for your work on this!
Dear Sascha,
Thank you. :-)
Yours,
Mengyao
On Aug 22, 2016, at 2:44 PM, Sascha Steinbiss notifications@github.com wrote:
Looks good, the error seems to be gone now, and dependencies (like SPAdes) still pass their tests. I have updated the Debian package now, thanks for your work on this!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library/issues/37#issuecomment-241509881, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlVdBHWg_nEvE_drM4QG8QXYT3ThUFaks5qie34gaJpZM4JPHrF.
Hi,
when running some test cases I noticed that sometimes the
ssw_test
tool prints binary characters in the BLAST like output; apparently as a result of accessing uninitialized memory.I just built it in a Debian stretch VM with a simple
make
(after adding-g
toCFLAGS
in the Makefile) using gcc version 5.4.0 20160609 (Debian 5.4.0-6). You can trigger the problematic behaviour using the test data included in the repo like this:Here's some Valgrind output that might be helpful:
It looks like the second sequence is exceeded when printing. I'm not sure whether this is just a formatting issue or something to do with the SW implementation itself; the SAM output also results in memory access issues according to Valgrind.