torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

Alignment and identity percentage in UCLUST file may be wrong #94

Closed torognes closed 7 years ago

torognes commented 7 years ago

A bug in Swarm have been identified that may lead to wrong alignments (cigar string) and incorrect identity percentages in the UCLUST (.uc) files outputted by Swarm when run with the -u option. This is due to incorrect gap penalties being used. We are working on a solution.

torognes commented 7 years ago

This issue has been resolved in Swarm version 2.1.10 just released.

frederic-mahe commented 7 years ago

hi @torognes , would you happen to have a toy example I could use to make a regression test for that particular issue?

torognes commented 7 years ago

This bug is the first issue described by Robert Mueller in his email of 8 December 2016:

The first issue is related to calls of the nw() method in algo_run() and algo_d1_run(). In both cases, nw() is called with gapopen, gapextend and score_matrix_63. However, score_matrix_63 is filled with penalty_mismatch in score_matrix_read() and (as far as I have seen) is not changed afterwards. Hence, it seems to me that the untransformed (gapopen, gapextend) and the transformed (penalty_mismatch) scoring functions are mixed up here. The (by default deactivated) call of nw() in scan.cc (search_chunk()), in contrast, is done (as I would say) correctly with penalty_gapopen, penalty_gapextend and score_matrix_63.

Sorry, but I do not think I have any sequences that could be used to test this. There was an obvious mix up of values that I just corrected.

The bug would result in wrong (untransformed) gap penalties being used when computing the cigar string and identity percentages in the .uc files.