rbturnbull / orthoflow

Orthoflow is a workflow for phylogenetic inference of genome-scale datasets of protein-coding genes.
https://rbturnbull.github.io/orthoflow/
Apache License 2.0
10 stars 2 forks source link

Multiple sequences with the same taxon name in ortholog files breaks supermatrix concatenation #28

Closed rbturnbull closed 2 years ago

rbturnbull commented 2 years ago

Multiple sequences with the same taxon name in ortholog files breaks supermatrix concatenation.

In the fourth ortholog file (tests/test-data/results/orthologs/OG0000004.fa), there are four instances of the taxon 'Derbesia_sp_WEST4838':

In the alignment module, when everything else is stripped from the ID except the taxon name, there are four instances with the exact same ID and this stuffs up the concatenation into the supermatrix.

rbturnbull commented 2 years ago

This was fixed in an earlier commit