Problems with different output results

530NLZ commented 1 year ago

Dear hyphy team, when I use hyphy, I find that the p value output by the BUSTED method is different from the official website. Is this normal? (My output is about 0.1562, and the official website is 0.0015) My run command is：time mpirun --allow-run-as-root -np 16 HYPHYMPI BUSTED --alignment /home/zzl/tutorial_data/ksr2.fna

Also,I would like to ask whether it is normal that the output during the operation is inconsistent with the official website, but the final analysis conclusion is consistent with the official website.Is this normal?

spond commented 1 year ago

Dear @530NLZ,

Which version of HyPhy do you have installed locally ($hyphy --version)? Also, what do you mean by the "official" website? The tutorial document from 2017?

There have been a lot of changes to HyPhy since then, and some options and analyses have changed.

Best, Sergei

530NLZ commented 1 year ago

Thanks for your time, I currently use the 2.5.20 version of HyPhy, and the official website I'm talking about is http://www.hyphy.org/tutorials/CL-prompt-tutorial

I want to compare with the results displayed on this website to prove that my current compilation and installation of HyPhy is completely correct. At present, my execution of make test is 100% passed. Can this prove that my current HyPhy has no problems?Is it normal to have different results every time I execute HyPhy with the same instruction?(I have no knowledge base in the biological field, just want to do software migration)

530NLZ commented 1 year ago

For example,I run HyPhy's partial output with the following command: mpirun -- allow run as root - np 16 HYPHYMPI BUSTED -- alignment/home/zzl/tutorial_ data/ksr2.fna

For test branches, the following rate distribution for branch-site combinations was inferred

Selection mode	dN/dS	Proportion, %
Negative selection	0.000	5.809
Negative selection	0.012	93.813
Diversifying selection	9.699	0.378

The following rate distribution for site-to-site synonymous rate variation was inferred

Rate	Proportion, %	Notes
0.411	85.718
4.526	6.154
4.541	8.128	Collapsed rate class

Performing the constrained (dN/dS > 1 not allowed) model fit

Log(L) = -5301.39, AIC-c = 10679.13 (38 estimated parameters)
For test branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

Selection mode	dN/dS	Proportion, %
Negative selection	0.000	12.561
Negative selection	0.000	84.129
Neutral evolution	1.000	3.310

The following rate distribution for site-to-site synonymous rate variation was inferred

Rate	Proportion, %	Notes
0.379	85.112
3.576	13.986
19.648	0.902

Branch-site unrestricted statistical test of episodic diversification [BUSTED]

Likelihood ratio test for episodic diversifying positive selection, p = 0.1563.

The output of the bold part always changes every time the same instruction is executed, and the output of the italic part is always consistent or different within 1%. Is this normal?

530NLZ commented 1 year ago

Thanks for your time, I currently use the 2.5.20 version of hyphy, and the official website I'm talking about is http://www.hyphy.org/tutorials/CL-prompt-tutorial I want to compare with the results displayed on this website to prove that my current compilation and installation of hyphy is completely correct. At present, my execution of make test is 100% passed. Can this prove that my current hyphy has no problems？

Attached are some screenshots of the operation.The output of the data in the red box is inconsistent each time, and the output of the data in the yellow box is consistent or different within 1% each time. Is this a normal phenomenon?

------------------ 原始邮件 ------------------ 发件人: "veg/hyphy" @.>; 发送时间: 2022年12月30日(星期五) 凌晨0:27 @.>; @.**@.>; 主题: Re: [veg/hyphy] Problems with different output results (Issue #1551)

Dear @530NLZ,

Which version of HyPhy do you have installed locally ($hyphy --version)? Also, what do you mean by the "official" website? The tutorial document from 2017?

There have been a lot of changes to HyPhy since then, and some options and analyses have changed.

Best, Sergei

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

spond commented 1 year ago

Dear @530NLZ,

I would suggest using the latest version of HyPhy (2.5.45 at this time), because it will be faster and more accurate (generally).

If make test passes on your system, then HyPhy is performing "correctly" in some sense. There are three things to keep in mind when looking at model results.

First, analyses themselves may change. For example, when the tutorial was written, BUSTED did not (by default) turn on site-to-site synonymous rate variation. It does now because of results like this. So if you want to run the same analysis as in the tutorial, try

$hyphy busted --alignment /Users/sergei/Downloads/tutorial_data/ksr2.fna  --srv No

* Log(L) = -5319.93, AIC-c = 10708.14 (34 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.024     |   98.068    |                                   |
|        Negative selection         |     0.092     |    1.894    |                                   |
|      Diversifying selection       |    115.804    |    0.038    |                                   |

### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -5326.44, AIC-c = 10719.15 (33 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   94.226    |                                   |
|        Negative selection         |     0.000     |    2.465    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |    3.309    |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.0007**.

Second, some analyses in HyPhy have a "stochastic" component; for example, BUSTED will try to guess starting values for the optimization of the likelihood function, and some of those are chosen with random variation. These initial guesses can influence the final result. Some analyses (e.g. BUSTED and RELAX) support the --starting-points N command line argument, where N is the number of starting guesses. Setting this number to ≥5 may improve run-to-run stability.

Third, there's some unavoidable numerical variability between systems, versions, and runs. It should be minor, but may influence the results a little.

Best, Sergei

530NLZ commented 1 year ago

Dear Sergei，

Thank you for your reply, which is very useful. Wish you a smooth work and a happy life

Best， Layton

veg / hyphy

Problems with different output results #1551

Performing the constrained (dN/dS > 1 not allowed) model fit

Branch-site unrestricted statistical test of episodic diversification [BUSTED]