Open stamatak opened 7 years ago
Rate categories are already free in the library. In fact there is no shape parameter in the partition but just the number of discrete rate categories and their values.
:-)
alexis
On 27.07.2016 10:20, ddarriba wrote:
Rate categories are already free in the library. In fact there is no shape parameter in the partition but just the number of discrete rate categories and their values.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/101#issuecomment-235518365, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1w-paq51sg4eWG7g8Ifn_4GB6laeI1ks5qZxTAgaJpZM4JV5r-.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
I already implemented all these models in IQ-TREE. One has to reimplement the likelihood kernel for these models. For model parameter optimization one can employ an EM algorithm; for estimating mixture weights the EM algorithm guarantees to find optimal solutions. Thus if you need help, let me know.
Minh
@bqminh: thanks for offering your help!
Do I understand it correctly that for UL/EX/EHO models the rates&weights are fixed:
http://www.atgc-montpellier.fr/download/datasets/models/mix_RatesProps.txt
so there are actually no parameters to optimize?
that's right, these models have default values for rates and weights. However, one should give a possibility to optimize the weights (while still fixing rates). I observed significant gain in likelihoods. Moreover, there is special PhyML version (very slow), which also allows to optimize weights. As I noticed, the EM algorithm can be used for this purpose.
Minh
here are some random comments. the discrete-rate model is in paml/baseml since 1994. this is described in YANG, Z., 1995 A space-time process model for the evolution of DNA sequences. Genetics 139: 993-1005. table 2 has some real data results. i use BFGS so that the optimisation is similar to the discrete gamma model. if you estimate both the frequencies and the rates as free parameters, you can't fit many categories (like 5 or 6) in real data analysis, but that may be because i tested using small datasets without many sequences in the alignment.
i think that if the interest is in the phylogeny and branch lengths, there is not that much difference among the different rate models.
also my impression is that EM is inefficient as an optimisation algorithm.
best, ziheng
At 13:39 31/07/2016 -0700, Bui Quang Minh wrote:
that's right, these models have default values for rates and weights. However, one should give a possibility to optimize the weights (while still fixing rates). I observed significant gain in likelihoods. Moreover, there is special PhyML version (very slow), which also allows to optimize weights. As I noticed, the EM algorithm can be used for this purpose.
Minh
� You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
dear ziheng,
many thanks for your insights, these pretty much reflect my intuition about the problem.
all the best,
alexis
On 01.08.2016 12:45, ziheng-yang wrote:
here are some random comments. the discrete-rate model is in paml/baseml since 1994. this is described in YANG, Z., 1995 A space-time process model for the evolution of DNA sequences. Genetics 139: 993-1005. table 2 has some real data results. i use BFGS so that the optimisation is similar to the discrete gamma model. if you estimate both the frequencies and the rates as free parameters, you can't fit many categories (like 5 or 6) in real data analysis, but that may be because i tested using small datasets without many sequences in the alignment.
i think that if the interest is in the phylogeny and branch lengths, there is not that much difference among the different rate models.
also my impression is that EM is inefficient as an optimisation algorithm.
best, ziheng
At 13:39 31/07/2016 -0700, Bui Quang Minh wrote:
that's right, these models have default values for rates and weights. However, one should give a possibility to optimize the weights (while still fixing rates). I observed significant gain in likelihoods. Moreover, there is special PhyML version (very slow), which also allows to optimize weights. As I noticed, the EM algorithm can be used for this purpose.
Minh
� You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/101#issuecomment-236548479, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1w-n8R33UhA9M5Ou3a-lmEyBH_KFIzks5qbc5UgaJpZM4JV5r-.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
Hi Ziheng,
thanks for your comments! please see my replies below,
On Aug 1, 2016, at 12:45 PM, ziheng-yang notifications@github.com wrote:
here are some random comments. the discrete-rate model is in paml/baseml since 1994. this is described in YANG, Z., 1995 A space-time process model for the evolution of DNA sequences. Genetics 139: 993-1005.
yes, I know your paper. And it is quite interesting that some authors reiterated this model (like http://mbe.oxfordjournals.org/content/29/11/3345.full http://mbe.oxfordjournals.org/content/29/11/3345.full) but unaware of your paper. I like the model because it does not assume any distribution.
table 2 has some real data results. i use BFGS so that the optimisation is similar to the discrete gamma model. if you estimate both the frequencies and the rates as free parameters, you can't fit many categories (like 5 or 6) in real data analysis, but that may be because i tested using small datasets without many sequences in the alignment.
this is exactly with big data sets where the two models give rise to different results… I can show you the data once our paper gets published.
i think that if the interest is in the phylogeny and branch lengths, there is not that much difference among the different rate models.
also my impression is that EM is inefficient as an optimisation algorithm.
this was also my thought at the beginning. I originally implemented the BFGS algorithm, but then we observed with simulated data (very long alignments) that it sometimes does not find the true rates and weights, which is weird. Afterward I implemented the EM algorithm, and it always found the true estimates. That’s why I switched to the EM algorithm.
Note that BFGS and EM are both local optimization. So one can never be sure if the optimal estimates are reached.
Minh
best, ziheng
At 13:39 31/07/2016 -0700, Bui Quang Minh wrote:
that's right, these models have default values for rates and weights. However, one should give a possibility to optimize the weights (while still fixing rates). I observed significant gain in likelihoods. Moreover, there is special PhyML version (very slow), which also allows to optimize weights. As I noticed, the EM algorithm can be used for this purpose.
Minh
� You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/101#issuecomment-236548479, or mute the thread https://github.com/notifications/unsubscribe-auth/AOM302Jfl07R440o7qSmhHh1CSdCv4C2ks5qbc5VgaJpZM4JV5r-.
Bui Quang Minh Center for Integrative Bioinformatics Vienna (CIBIV) Campus Vienna Biocenter 5, VBC5, Ebene 1 A-1030 Vienna, Austria Phone: ++43 1 4277 74326 Email: minh.bui (AT) univie.ac.at
I think impkementing UL, EX, EHO might be a good idea (Olivier Gascuel liked those models), regarding free rates, I am not so sure, if it's not already implemented