naturalis / barcode-constrained-phylogeny

Pipeline for building topologically-constrained phylogenies from DNA barcode data
https://naturalis.github.io/barcode-constrained-phylogeny/
Apache License 2.0
2 stars 3 forks source link

OpenToL 'broken' taxa #81

Closed rvosa closed 6 months ago

rvosa commented 7 months ago

When requesting a subtree by its tip ott IDs, there are cases where one or more of the IDs are for species that OpenToL views as 'broken'. For example, Alouatta seniculus, or Callicebus personatus.

This is because at the subspecies level they are entangled with other species (respectively Alouatta sara and Callicebus coimbrai). In the requested subtree, such broken tips are only identified by a label of the form mrcaott126846ott126847, which means we can't immediately figure out which species triggered this.

The solution appears to be to:

  1. parse the label to get the IDs (126846 and 126847)
  2. requesting the induced subtree for those IDs
  3. fetching the tip labels

Whichever label corresponds for the first parts with a species in the input (unaligned.fa) that hasn't otherwise been seen in previous output should be the one to keep and to graft onto the tree.

rvosa commented 6 months ago

Implemented as of commit https://github.com/naturalis/barcode-constrained-phylogeny/commit/194c9633a3a587a9604a84a0bf82bfb6b0c49fdd