naturalis / barcode-constrained-phylogeny

Pipeline for building topologically-constrained phylogenies from DNA barcode data
https://naturalis.github.io/barcode-constrained-phylogeny/
Apache License 2.0
2 stars 3 forks source link

hmmalign parameterization #67

Closed rvosa closed 9 months ago

rvosa commented 11 months ago

The current wrapper around hmmalign has resulted in unaligned sequences (e.g. in Naomi's report). It's possible that this is because the produced output, in Stockholm format, is not parsed and interpreted correctly. What looks less error prone is to use a different output format. For example, with hmmalign --outformat afa [options] [hmm] [query] the result is in aligned FASTA. A quick check with some variable sequences (quite distantly related primates, with COI sequences of different lengths) resulted in pretty good alignments nonetheless:

Screenshot 2023-11-08 at 16 58 19
rvosa commented 9 months ago

This issue is now solved. One of the issues was that the alignment needs to be trimmed (with --trim) so it only encompasses the area on which the input HMM was trained. Another was possibly with Stockholm import. Stockholm is useful for testing the orientation because it exports posterior probabilities, which we use, but importing the alignment itself seems to have had some issues. We now use PHYLIP for the alignment itself.