Closed Francois75012 closed 5 years ago
What settings did you use?
What output did you get? What did you expect?
Marcus
On Jun 8, 2015, at 8:35 AM, Francois75012 notifications@github.com wrote:
two small sequences aligned perfectly well with swalign (ex. AGTCAGCCTGTAAACTGTAATATTTCTGATATGCAGATAATGCTTTTGAATAATCTTCCAATAAGAGGTTGAAGTG and CTTCCAATAGTCAGCCTGTAAACTGTAATATTTCTGAT). When it comes to larger sequences, the alignment is not working any more (ex. TGTACCCAGAGGATTTTGGAAACAGTTCAGGCTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTTA and ATTCTTGCAATTGAAGGCACAGGTAGTGATACAGATGATAATATATATGGGGAAGAGAAGCAGGACATTGAGATTAATGTTGATCTTCGAGATGATGTTTTTGGATATCCTCATCAATTTGAAGACAAACCAGCATTAACTAAGACTGAAGATAGAAAAGAGTACAATATTGGTGTCCTAAGACACCTTCAGGTCGTCTTTGGTCATTTAGCTGCTTCCCGACTGCAATACTATGTACCCAGAGGATTTTGGAAACAGTTCAG) Could you explain why ?
— Reply to this email directly or view it on GitHub https://github.com/mbreese/swalign/issues/6.
Thanks for your fast answer Marcus.
I was puzzled by a so much difference in the output between swalign and emboss water.
In swalign, I used the following settings :
match = 2 mismatch = -1 gap_penalty = -1 gap_extension_penalty = -1 gap_extension_decay = 0.0 prefer_gaps = True verbose = False
I expect the same result as "water" gives :
(result from the EMBOSS webservice with the same sequences :
########################################
########################################
#
#
# #
1 1 TGTACCCAGAGGATTTTGGAAACAGTTCAG 30 |||||||||||||||||||||||||||||| 2 234 TGTACCCAGAGGATTTTGGAAACAGTTCAG 263
Result from swalign :
Query: 26 TGATA--CAGATGATAATATATATGGGGAAGAGAAG--CAGGACATTGAGATTAA--TGTTGATCTTCGAG-A-----TGATGTTTT 101 || || ||||.||| | | | |||| | .|| |||| |.|||.|.|.|| ||||.|||||||.| | |||||.||| Ref : 0 TG-TACCCAGAGGAT--T-T-T----GGAA-A-CAGTTCAGG-CTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTT 75
Le 08/06/2015 18:59, Marcus Breese a écrit :
What settings did you use?
What output did you get? What did you expect?
Marcus
On Jun 8, 2015, at 8:35 AM, Francois75012 notifications@github.com wrote:
two small sequences aligned perfectly well with swalign (ex. AGTCAGCCTGTAAACTGTAATATTTCTGATATGCAGATAATGCTTTTGAATAATCTTCCAATAAGAGGTTGAAGTG and CTTCCAATAGTCAGCCTGTAAACTGTAATATTTCTGAT). When it comes to larger sequences, the alignment is not working any more (ex. TGTACCCAGAGGATTTTGGAAACAGTTCAGGCTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTTA and ATTCTTGCAATTGAAGGCACAGGTAGTGATACAGATGATAATATATATGGGGAAGAGAAGCAGGACATTGAGATTAATGTTGATCTTCGAGATGATGTTTTTGGATATCCTCATCAATTTGAAGACAAACCAGCATTAACTAAGACTGAAGATAGAAAAGAGTACAATATTGGTGTCCTAAGACACCTTCAGGTCGTCTTTGGTCATTTAGCTGCTTCCCGACTGCAATACTATGTACCCAGAGGATTTTGGAAACAGTTCAG) Could you explain why ?
— Reply to this email directly or view it on GitHub https://github.com/mbreese/swalign/issues/6.
— Reply to this email directly or view it on GitHub https://github.com/mbreese/swalign/issues/6#issuecomment-110073185.
François PIUMI Unité Biologie du Développement et de la Reproduction INRA - Domaine de Vilvert - Jouy en Josas 78352 Cedex Tél: +(33) 1 34 65 21 09
The difference is in the weights / penalties applied. The algorithms used between the two programs should be the same… EMBOSS water seems to use 10 and 0.5 as the gap open and gap extension penalties.
So, if you use the same penalties, you end up with the same alignment:
$ swalign -gap 10 -gapext 0.5 TGTACCCAGAGGATTTTGGAAACAGTTCAGGCTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTTA ATTCTTGCAATTGAAGG CACAGGTAGTGATACAGATGATAATATATATGGGGAAGAGAAGCAGGACATTGAGATTAATGTTGATCTTCGAGATGATGTTTTTGGATATCCTCATCAATTTGAAGACAAACCAGCATTAACT AAGACTGAAGATAGAAAAGAGTACAATATTGGTGTCCTAAGACACCTTCAGGTCGTCTTTGGTCATTTAGCTGCTTCCCGACTGCAATACTATGTACCCAGAGGATTTTGGAAACAGTTCAG
Query: cmdline (263 nt)
Ref : cmdline (76 nt)
Query: 234 TGTACCCAGAGGATTTTGGAAACAGTTCAG 263
||||||||||||||||||||||||||||||
Ref : 1 TGTACCCAGAGGATTTTGGAAACAGTTCAG 30
Score: 60
Matches: 30 (100.0%)
Mismatches: 0
CIGAR: 30M
For the library version, this would be setup something like this:
LocalAlignment(NucleotideScoringMatrix(2, -1), gap_penalty=-10, gap_extension_penalty=-0.5).align(ref, query)
Thank you very much Marcus, it now worked perfectly well !
Francois
Le 08/06/2015 23:16, Marcus Breese a écrit :
The difference is in the weights / penalties applied. The algorithms used between the two programs should be the same… EMBOSS water seems to use 10 and 0.5 as the gap open and gap extension penalties.
So, if you use the same penalties, you end up with the same alignment:
$ swalign -gap 10 -gapext 0.5 TGTACCCAGAGGATTTTGGAAACAGTTCAGGCTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTTA ATTCTTGCAATTGAAGGCACAGGTAGTGATACAGATGATAATATATATGGGGAAGAGAAGCAGGACATTGAGATTAATGTTGATCTTCGAGATGATGTTTTTGGATATCCTCATCAATTTGAAGACAAACCAGCATTAACTAAGACTGAAGATAGAAAAGAGTACAATATTGGTGTCCTAAGACACCTTCAGGTCGTCTTTGGTCATTTAGCTGCTTCCCGACTGCAATACTATGTACCCAGAGGATTTTGGAAACAGTTCAG
For the library version, this would be setup something like this:
LocalAlignment(NucleotideScoringMatrix(2, -1), gap_penalty=-10, gap_extension_penalty=-0.5).align(ref, query)
On Jun 8, 2015, at 12:17 PM, Francois75012 notifications@github.com wrote:
Thanks for your fast answer Marcus.
I was puzzled by a so much difference in the output between swalign and emboss water.
In swalign, I used the following settings :
match = 2 mismatch = -1 gap_penalty = -1 gap_extension_penalty = -1 gap_extension_decay = 0.0 prefer_gaps = True verbose = False
I expect the same result as "water" gives :
(result from the EMBOSS webservice with the same sequences :
########################################
Program: water
Rundate: Mon 8 Jun 2015 20:13:13
Commandline: water
-auto
-stdout
-asequence emboss_water-I20150608-201312-0800-66302481-pg.asequence
-bsequence emboss_water-I20150608-201312-0800-66302481-pg.bsequence
-datafile EDNAFULL
-gapopen 10.0
-gapextend 0.5
-aformat3 pair
-snucleotide1
-snucleotide2
Align_format: pair
Report_file: stdout
########################################
=======================================
#
Aligned_sequences: 2
1: 1
2: 2
Matrix: EDNAFULL
Gap_penalty: 10.0
Extend_penalty: 0.5
#
Length: 30
Identity: 30/30 (100.0%)
Similarity: 30/30 (100.0%)
Gaps: 0/30 ( 0.0%)
Score: 150.0
# #
=======================================
1 1 TGTACCCAGAGGATTTTGGAAACAGTTCAG 30 |||||||||||||||||||||||||||||| 2 234 TGTACCCAGAGGATTTTGGAAACAGTTCAG 263
---------------------------------------
---------------------------------------
Result from swalign :
Query: 26
TGATA--CAGATGATAATATATATGGGGAAGAGAAG--CAGGACATTGAGATTAA--TGTTGATCTTCGAG-A-----TGATGTTTT
101 || || ||||.||| | | | |||| | .|| |||| |.|||.|.|.|| ||||.|||||||.| | |||||.||| Ref : 0
TG-TACCCAGAGGAT--T-T-T----GGAA-A-CAGTTCAGG-CTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTT
75
Le 08/06/2015 18:59, Marcus Breese a écrit :
What settings did you use?
What output did you get? What did you expect?
Marcus
On Jun 8, 2015, at 8:35 AM, Francois75012 <notifications@github.com mailto:notifications@github.com> wrote:
two small sequences aligned perfectly well with swalign (ex.
AGTCAGCCTGTAAACTGTAATATTTCTGATATGCAGATAATGCTTTTGAATAATCTTCCAATAAGAGGTTGAAGTG and CTTCCAATAGTCAGCCTGTAAACTGTAATATTTCTGAT). When it comes to larger sequences, the alignment is not working any more (ex.
TGTACCCAGAGGATTTTGGAAACAGTTCAGGCTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTTA and
ATTCTTGCAATTGAAGGCACAGGTAGTGATACAGATGATAATATATATGGGGAAGAGAAGCAGGACATTGAGATTAATGTTGATCTTCGAGATGATGTTTTTGGATATCCTCATCAATTTGAAGACAAACCAGCATTAACTAAGACTGAAGATAGAAAAGAGTACAATATTGGTGTCCTAAGACACCTTCAGGTCGTCTTTGGTCATTTAGCTGCTTCCCGACTGCAATACTATGTACCCAGAGGATTTTGGAAACAGTTCAG) Could you explain why ?
— Reply to this email directly or view it on GitHub <https://github.com/mbreese/swalign/issues/6 https://github.com/mbreese/swalign/issues/6>.
— Reply to this email directly or view it on GitHub
<https://github.com/mbreese/swalign/issues/6#issuecomment-110073185 https://github.com/mbreese/swalign/issues/6#issuecomment-110073185>.
François PIUMI Unité Biologie du Développement et de la Reproduction INRA - Domaine de Vilvert - Jouy en Josas 78352 Cedex Tél: +(33) 1 34 65 21 09
— Reply to this email directly or view it on GitHub https://github.com/mbreese/swalign/issues/6#issuecomment-110111486.
— Reply to this email directly or view it on GitHub https://github.com/mbreese/swalign/issues/6#issuecomment-110142677.
François PIUMI Unité Biologie du Développement et de la Reproduction INRA - Domaine de Vilvert - Jouy en Josas 78352 Cedex Tél: +(33) 1 34 65 21 09
Is it fair to say this issue is closed?
It should be, yes.
two small sequences aligned perfectly well with swalign (ex. AGTCAGCCTGTAAACTGTAATATTTCTGATATGCAGATAATGCTTTTGAATAATCTTCCAATAAGAGGTTGAAGTG and CTTCCAATAGTCAGCCTGTAAACTGTAATATTTCTGAT). When it comes to larger sequences, the alignment is not working any more (ex. TGTACCCAGAGGATTTTGGAAACAGTTCAGGCTTTGGGGTGAACCTGTTAATCTTCGTGAACAACATGATGCTTTA and ATTCTTGCAATTGAAGGCACAGGTAGTGATACAGATGATAATATATATGGGGAAGAGAAGCAGGACATTGAGATTAATGTTGATCTTCGAGATGATGTTTTTGGATATCCTCATCAATTTGAAGACAAACCAGCATTAACTAAGACTGAAGATAGAAAAGAGTACAATATTGGTGTCCTAAGACACCTTCAGGTCGTCTTTGGTCATTTAGCTGCTTCCCGACTGCAATACTATGTACCCAGAGGATTTTGGAAACAGTTCAG) Could you explain why ?