stephaneguindon / phyml

PhyML -- Phylogenetic estimation using (Maximum) Likelihood
GNU General Public License v3.0
177 stars 61 forks source link

Wrong distance matrix output #152

Closed wdenggithub closed 3 years ago

wdenggithub commented 3 years ago

Hi Stephane,

Thanks for putting on the distance matrix output. I downloaded the most recent version of PhyML and tested on the output. It seems that the output is not correct, which is not triangular symmetry and some distances between sequence itself are not 0. Attached please find an example output.

The other issue is that it outputs the distance matrix and exit, no tree output. Could I have the results that outputs the distance matrix before the optimization, tree estimation and the distance matrix after the optimization?

Best regards,

Wenjie Deng example_output.txt

stephaneguindon commented 3 years ago

Dear Wenjie, There was a problem indeed. The matrix produced was in fact that modified by the BioNJ algorithm, explaining the lack of symmetry and the non-zero outputs on the diagonal. It should be fixed now. Please see 669ae30655d12e79333487ec0e3227044a47d974 and let me know if there are remaining issues.

wdenggithub commented 3 years ago

Dear Stephane,

The output matrix is correct. Thank you so much. As I mentioned in my previous message, I am also interested in the estimation of ML tree and it's distance matrix. So I modified your main.c to comment the line of "exit(-1)" following "Print_Mat(ML_Dist(cdata,mod)" and add "Print_Mat(ML_Dist(cdata,mod)" after the most likely tree was produced. So I got a ML tree and two distance matrix (one is BioNJ, the other is ML tree). Is it correct? I also noticed that the second matrix is only for the unique sequences. For the attached example output, I have 28 sequences, 24 unique sequences. I got the ML tree matrix of 24 unique sequences. If the distances are correct, I can parse the matrix and expand to include all 28 sequences' matrix. Thanks again for your help.

Best regards,

Wenjie phyml_example_output.txt

stephaneguindon commented 3 years ago

Yes, commenting out the exit(-1) should work as expected. By default, PhyML removes the duplicate sequences before launching the reconstruction so as to speed up the calculation. You may want to try the --leave_duplicates option if leaving the duplicated sequences during the analysis is more convenient for you.

wdenggithub commented 3 years ago

Dear Stephane,

It works perfectly. Thanks so much for your help. I really appreciate!

Wenjie

stephaneguindon commented 3 years ago

Great!