iTaxoTools / TaxI2-legacy

Calculates genetic differences between DNA sequences
GNU General Public License v3.0
0 stars 0 forks source link

Problem with alignment algorithm when sequences are very divergent #67

Closed mvences closed 3 years ago

mvences commented 3 years ago

We have been running some analyses with TaxI2 on real datasets for a study, and I noticed that the divergence values are unrealistically high in some cases. A closer look showed that the problem is in the pairwise alignment algorithm which in such cases performs poorly. Specifically, the algorithm is inserting too few gaps in some of the pairwise alignments, and therefore, an excessive number of differences are found.

The problem can probably be solved rather easily by changing the parameters of the alignment algorithm. Before making any change, could you please recall me which alignment algorithm you have used? I think it was the one from Biopython? In that case, it should be possible to set parameters such as "open gap penalty" and "extend gap penalty" and some others? - please confirm!

Then as next steps, probably we should do two things:

necrosovereign commented 3 years ago

The parameters can be changed by editing the data/scores.tab file.

mvences commented 3 years ago

The problem is that in the compiled standalone tool, this settings file cannot be accessed and modified any more by the user. But for the TaxI2 I would say this is not a priority problem. I have been tinkering around with various settings and appropriate ones seem to be:

gap penalty -3 gap extend penalty -1 end gap penalty -1 end gap extend penalty -1 match score 1 mismatch score -1

I could apply this change myself, but maybe better that you do it so you have control about all things that happen with the code in the repository. Could you please change the settings in the scores.tab files both of TaxI2 and TaxI3 to the values given above? This will then close the present issue.

I will then create a new issue for TaxI3, asking for an option for the user to set and modify these alignment settings directly from the GUI. But this is low priority and the issue can be left open for now, I will just create it to make sure we don't forget it once we start overhauling the GUI.