veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

Calculate branch-class specific omega #1625

Closed chenyangkang closed 10 months ago

chenyangkang commented 1 year ago

Hi team,

I came across a problem: How to find branch-(omega)class specific rate in BUSTED results? My intention is to calculate the average purifying selection strength as (proportion1 · dN/dS_1) + (proportion2 · dN/dS_2)), where dN/dS_1 and dN/dS_2 are the two omega<1 classes, and calculate the positive selection as (proportion3 · dN/dS_3), where the dN/dS_3 is the omega of the third class -- positive selection.

Now I can only find the proportion in the result file: image

But how to know the dN/dS for each site class?

Seems like we only got a uniformed value for each branch: image Like the "unconstrained" in the screenshot above.

I think in aBSREL we can have the proportion of rate class, and the rate, at the same time. Do I need to shift to aBSREL?

Thanks! Yangkang

spond commented 1 year ago

Dear @chenyangkang,

You are proposing to compute conditional prior expectations on the inferred ω distribution. BUSTED does not estimate separate ω per branch. It will estimate a distribution of ω shared by all branches in the Test class (which is all branches by default) and in the Background class (if you specify --branches ... on the command line).

For example, if you run hyphy busted --alignment tests/hbltests/libv3/data/CD2.nex --branches GROUP1 and look at the resulting JSON, the ω distributions are in .["fits"]["Unconstrained model"]["Rate Distributions"] (using jq notation)

image

In more recent versions of BUSTED you will also find a per-branch estimate of posterior probabilities that ω = class. These are written (for Test branches only!) into the corresponding branch level entries, for example:

jq '.["branch attributes"]["0"]["RHMONKEY"]' tests/hbltests/libv3/data/CD2.nex.BUSTED.json
image

Best, Sergei

PS. For branch-level estimates of ω aBSREL may be better.

chenyangkang commented 1 year ago

@spond Thanks Sergei! I'll use aBSREL then. I have another question related to the branch-specific rate. Do you think it is reasonable to regress the omega estimated for each tips and their phenotypes (for example body mass), to detect if certain gene is to be accounted for the evolution of the trait?

spond commented 1 year ago

Dear @chenyangkang,

It's not unreasonable, but I would view such an analysis as "rough and exploratory". There's a big statistical issue in treating estimated dN/dS as "observed" for regression estimates. Basically, the variance in dN/dS (precision of estimates) will be different between tips due to a variety of factors, and this needs to be accounted for in regression.

I would suggest some sort of "joint" estimation model, where phenotypic / genotypic information is incorporated into the estimate itself. For discrete phenotypes, for example, we have https://github.com/veg/hyphy-analyses/tree/master/BUSTED-PH.

For continuous phenotypes, there has been quite a bit of literature work on some form of joint models (usually in the Bayesian framework).

Best, Sergei

chenyangkang commented 1 year ago

@spond Thanks Sergei. It's important that you mentioned the varied precision of dN/dS estimation on each branch. I'm wandering if it is possible to also output the variance for that estimate at each branch? So that the uncertainty could be incorporated into downstream bayesian model.

Speaking of joint model for continuous phenotypes, would you mind recommend me some papers? Since I've been searching for them for a while but most are less related. Thank you in advance!

Yangkang

spond commented 1 year ago

Dear @chenyangkang,

Which quantity would you like to have an estimated variance for? For maximum likelihood methods, the "cheapest" estimate is profile likelihood for individual parameters; these may or may not be what you want. Also, if you are interested in mean dN/dS estimates, then you might be better off using something simpler like FitMG94.bf because with aBSREL you are going to get distributions of dN/dS per branch, which would then be combined into an aggregate value.

As far as papers go, you can take a look at https://www.nature.com/articles/s41559-022-01932-7 (although it's for discrete phenotypes), https://academic.oup.com/evolut/article/66/6/1773/6851530 (Bayesian joint models).

Best, Sergei

github-actions[bot] commented 10 months ago

Stale issue message