matsengrp / historydag

https://matsengrp.github.io/historydag
GNU General Public License v3.0
0 stars 1 forks source link

Sankoff with multiple criteria #55

Open marybarker opened 1 year ago

marybarker commented 1 year ago

It might be useful to generalize the Sankoff algorithm to admit both sequence and geographic data.

This can be done naively for the 2-region case by appending a binary-value character to the existing sequences that represents the geographic region of interest. However, there are 2 limitations to this that we would like to get around:

One way to get around this would be to create separate node attributes, as @willdumm suggests in this comment with cost matrices for these.

In this case, we would need to couple the attributes carefully. A potential problem when optimizing cost using multiple criteria is that it is possible for the overall cost to achieve an optimal value at a combination that is not a local optimal for any of the attributes independently. That is, the overall cost of the choice at a given node should be realized for the entire set of attributes on each tree below it.

marybarker commented 1 year ago

With some care, this might also be generalized to allow a constrained Sankoff-geography method, where the sequences are resolved first, and then geography is inferred based on the optimal Sequence labeling.