This pull request fixes a bug caused by mafft shortening sequence names. In phydms_prepalignment function MAFFT_CDSandProtAlignments there are two dictionaries keyed by prealignment sequence names. However, mafft trims sequence names to 253 characters so there was mismatch of sequence names pre- vs. post-alignment if the original name was greater than 253 characters. The current solution was to create this dictionaries using only the first 253 characters. The script now also checks that trimming to the first 253 characters still results in unique sequence names.
There is one major downside to this solution. I could not find a reference to the shortening of sequence names in the mafft documentation. This means that if mafft changes this behavior, which seems unlikely, phydms_prepalignment will break again. A more complicated solution would be to check if the post-alignment name is a substring of the pre-alignment name but it is difficult to key a dictionary this way.
Because a mafft behavior change seems unlikely, I went with the simple fix. But I understand if @jbloom thinks a refactor is worth it for more robust behavior.
This pull request fixes a bug caused by mafft shortening sequence names. In
phydms_prepalignment
functionMAFFT_CDSandProtAlignments
there are two dictionaries keyed by prealignment sequence names. However,mafft
trims sequence names to 253 characters so there was mismatch of sequence names pre- vs. post-alignment if the original name was greater than 253 characters. The current solution was to create this dictionaries using only the first 253 characters. The script now also checks that trimming to the first 253 characters still results in unique sequence names.There is one major downside to this solution. I could not find a reference to the shortening of sequence names in the
mafft
documentation. This means that ifmafft
changes this behavior, which seems unlikely,phydms_prepalignment
will break again. A more complicated solution would be to check if the post-alignment name is a substring of the pre-alignment name but it is difficult to key a dictionary this way.Because a
mafft
behavior change seems unlikely, I went with the simple fix. But I understand if @jbloom thinks a refactor is worth it for more robust behavior.This pull request closes issue #25