veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

Problems with different output results #1551

Closed 530NLZ closed 1 year ago

530NLZ commented 1 year ago

Dear hyphy team, when I use hyphy, I find that the p value output by the BUSTED method is different from the official website. Is this normal? (My output is about 0.1562, and the official website is 0.0015) My run command is:time mpirun --allow-run-as-root -np 16 HYPHYMPI BUSTED --alignment /home/zzl/tutorial_data/ksr2.fna

Also,I would like to ask whether it is normal that the output during the operation is inconsistent with the official website, but the final analysis conclusion is consistent with the official website.Is this normal?

spond commented 1 year ago

Dear @530NLZ,

Which version of HyPhy do you have installed locally ($hyphy --version)? Also, what do you mean by the "official" website? The tutorial document from 2017?

There have been a lot of changes to HyPhy since then, and some options and analyses have changed.

Best, Sergei

530NLZ commented 1 year ago

Thanks for your time, I currently use the 2.5.20 version of HyPhy, and the official website I'm talking about is http://www.hyphy.org/tutorials/CL-prompt-tutorial

I want to compare with the results displayed on this website to prove that my current compilation and installation of HyPhy is completely correct. At present, my execution of make test is 100% passed. Can this prove that my current HyPhy has no problems?Is it normal to have different results every time I execute HyPhy with the same instruction?(I have no knowledge base in the biological field, just want to do software migration)

530NLZ commented 1 year ago

For example,I run HyPhy's partial output with the following command: mpirun -- allow run as root - np 16 HYPHYMPI BUSTED -- alignment/home/zzl/tutorial_ data/ksr2.fna

Selection mode dN/dS Proportion, % Notes
Negative selection 0.000 5.809
Negative selection 0.012 93.813
Diversifying selection 9.699 0.378
Rate Proportion, % Notes
0.411 85.718
4.526 6.154
4.541 8.128 Collapsed rate class

Performing the constrained (dN/dS > 1 not allowed) model fit

Selection mode dN/dS Proportion, % Notes
Negative selection 0.000 12.561
Negative selection 0.000 84.129
Neutral evolution 1.000 3.310
Rate Proportion, % Notes
0.379 85.112
3.576 13.986
19.648 0.902

Branch-site unrestricted statistical test of episodic diversification [BUSTED]

Likelihood ratio test for episodic diversifying positive selection, p = 0.1563.

The output of the bold part always changes every time the same instruction is executed, and the output of the italic part is always consistent or different within 1%. Is this normal?

530NLZ commented 1 year ago

Thanks for your time, I currently use the 2.5.20 version of hyphy, and the official website I'm talking about is http://www.hyphy.org/tutorials/CL-prompt-tutorial I want to compare with the results displayed on this website to prove that my current compilation and installation of hyphy is completely correct. At present, my execution of make test is 100% passed. Can this prove that my current hyphy has no problems?

Attached are some screenshots of the operation.The output of the data in the red box is inconsistent each time, and the output of the data in the yellow box is consistent or different within 1% each time. Is this a normal phenomenon?

------------------ 原始邮件 ------------------ 发件人: "veg/hyphy" @.>; 发送时间: 2022年12月30日(星期五) 凌晨0:27 @.>; @.**@.>; 主题: Re: [veg/hyphy] Problems with different output results (Issue #1551)

Dear @530NLZ,

Which version of HyPhy do you have installed locally ($hyphy --version)? Also, what do you mean by the "official" website? The tutorial document from 2017?

There have been a lot of changes to HyPhy since then, and some options and analyses have changed.

Best, Sergei

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

spond commented 1 year ago

Dear @530NLZ,

I would suggest using the latest version of HyPhy (2.5.45 at this time), because it will be faster and more accurate (generally).

If make test passes on your system, then HyPhy is performing "correctly" in some sense. There are three things to keep in mind when looking at model results.

First, analyses themselves may change. For example, when the tutorial was written, BUSTED did not (by default) turn on site-to-site synonymous rate variation. It does now because of results like this. So if you want to run the same analysis as in the tutorial, try

$hyphy busted --alignment /Users/sergei/Downloads/tutorial_data/ksr2.fna  --srv No

* Log(L) = -5319.93, AIC-c = 10708.14 (34 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.024     |   98.068    |                                   |
|        Negative selection         |     0.092     |    1.894    |                                   |
|      Diversifying selection       |    115.804    |    0.038    |                                   |

### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -5326.44, AIC-c = 10719.15 (33 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   94.226    |                                   |
|        Negative selection         |     0.000     |    2.465    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |    3.309    |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.0007**.

Second, some analyses in HyPhy have a "stochastic" component; for example, BUSTED will try to guess starting values for the optimization of the likelihood function, and some of those are chosen with random variation. These initial guesses can influence the final result. Some analyses (e.g. BUSTED and RELAX) support the --starting-points N command line argument, where N is the number of starting guesses. Setting this number to ≥5 may improve run-to-run stability.

Third, there's some unavoidable numerical variability between systems, versions, and runs. It should be minor, but may influence the results a little.

Best, Sergei

530NLZ commented 1 year ago

Dear Sergei,

Thank you for your reply, which is very useful. Wish you a smooth work and a happy life

Best, Layton