Open StefanFlaumberg opened 1 month ago
@StefanFlaumberg Thanks for the suggestion! This is a good idea. We are currently busy with various projects but I will consider to do so, perhaps in the coming few weeks/months.
Hi Stefan,
Related to this, we are working on a different solution to this problem. I'm not totally convinced that MAST is the right way to go here - I like the idea in principle (as I like all ideas for making all the different avenues of IQ-TREE work together), but the problem is that orthogonal mixture classes are multiplicative. So, if you have e.g. 5 MAST trees (i.e. tree classes), 60 profiles (i.e. frequency classes), and e.g. a +R4 model (i.e. 4 rate classes), then every site has 5604 = 1200 likelihoods to calculate, and any estimation will need 1200 times the RAM of estimating a single likelihood per site.
Because of this, anything we can do to reign in the number of classes is useful. One is to assume a tree.
So, another solution to the circular problem is to do what is internal to phylogenetics programs anyway, and:
W.r.t. convergence, you could look at the correlation of the Q matrix from 1 iteration to the next. Le and Gascuel did that for the LG model, and we copied them for the QMaker paper (I think we set the correlation had to be >0.999). We have been using the same approach for lots of estimates of Q matrices, and in my experience the process almost never goes beyond 2 iterations (even if the tree changes a decent amount after the first iteration), suggesting that in most cases the tree is not too important for estimating the Q matrix.
I hope some of that helps.
Rob
Dear IQ-Tree team,
In a recent paper you have shown that re-estimating the substitution matrix under a profile mixture model on a database of relevant sequences (resulting in a GTRpmix matrix) may improve phylogenetic reconstruction accuracy. However, such matrix reestimation itself needs a guide tree, thus posing a self-reference problem as one would like to re-estimate the matrix to improve reconstruction of the very same tree being used as the guide tree. To put it shorter, the true topology of what should be used as a guide tree is usually unknown. Fortunately, in practice we usually know the general topology of a species tree, but not sure about just several bipartitions in it. This leads to an elegant solution -- to use the tree-mixture model (MAST) with equal tree-weights during GTRpmix matrix estimation to express our partial knowledge about the guide tree topology.
Currently MAST works well with frequency profile mixtures, but cannot link the GTR20 matrix parameters across the frequency profiles. One gets a segmentation fault on trying to include the
--link-exchange-rates
option, like this:Could you, please, implement the
--link-exchange-rates
option in the MAST model for the approach to work? Thank you!Best, Stefan