refresh-bio / FAMSA

Algorithm for ultra-scale multiple sequence alignments (3M protein sequences in 5 minutes and 24 GB of RAM)
GNU General Public License v3.0
150 stars 25 forks source link

Fix off-by-one error in `GappedSequence` constructor #48

Closed althonos closed 2 months ago

althonos commented 3 months ago

Hi @agudys!

There is an off-by-one error in the encoding constructor of GappedSequence. The symbols should be written to the symbol array starting from offset 1, but currently starts at offset 0. This causes issues with the internal sequence representation, and with the decoding as well. Luckily this constructor is actually not used anywhere in FAMSA, but when I tried wrapping it in PyFAMSA I started getting weird issues and traced it back here!

agudys commented 2 months ago

@althonos Thank you Martin! The bugfix has been added through our development repository with several other changes.