veg / hyphy-analyses

HyPhy standalone analyses
MIT License
37 stars 17 forks source link

Synonymous rate variation #10

Open kavithamallayaa opened 4 years ago

kavithamallayaa commented 4 years ago

Hi,

I am investigating selection efficiency on set protein coding sequences and have applied RELAX to see the relative intensity on test branches compared to reference with some unclassified branches. There was evidence of intensification of selection for some sequences with very high omega three values(I have reported this issue).

I repeated the RELAX analyses with different rate classes(2,3 and 4). I could see the number of sequences with these high omega values tends increase drastically with the rate classes. Can I infer that branch model is better than branch-site model for this data.

I am interested in studying the synonymous site variation(SRV) for this data, but on a relative scale. I would like to investigate the difference in the SRV for the test and reference branch. Is it possible to obtain SRV for test and reference branches separately.

with regards Kavitha

spond commented 4 years ago

Dear @kavithamallayaa,

I repeated the RELAX analyses with different rate classes(2,3 and 4). I could see the number of sequences with these high omega values tends increase drastically with the rate classes. Can I infer that branch model is better than branch-site model for this data.

Convergence issues for complex models typically indicate that the data are too sparse for reliable inference: too few sequences, too low divergence, the test / reference set is too small (one or two branches etc). You should compare the c-AIC scores that are obtained for models with different numbers of rate classes, and if there is no support for more complex models (3 or 4 rate classes), then choose the simplest one. "Branch" (one ω per branch set) model is equivalent to 1 rate class. It will be fitted as a part of the RELAX initial procedure; you can use the c-AIC for the comparison as well. In the example below, the "branch model" has AIC-c of 6981.93

### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model
* Log(L) = -3457.35, AIC-c =  6981.93 (33 estimated parameters)
* non-synonymous/synonymous rate ratio for *Reference* =   1.5214
* non-synonymous/synonymous rate ratio for *Test* =   0.6122
* non-synonymous/synonymous rate ratio for *Unclassified* =   0.6256

and the 2-rate RELAX model has the AIC-c of 6924.72 (strongly preferred to the "branch model")

### Fitting the alternative model to test K != 1
* Log(L) = -3424.59, AIC-c =  6924.72 (37 estimated parameters)
* Relaxation/intensification parameter (K) =     0.20

Feel free to respond with the c-AIC that were inferred for your dataset for different numbers of rates, and we can take a look.

As far as testing SRV; yes it is possible to obtain such a distribution (not with RELAX, but with BUSTED[S]). The version of the file that ships with the distribution uses the same SRV distribution for test/reference branches, but the BUSTED-SR version does not (it estimates separate distributions for different branch partitions).

Best, Sergei

kavithamallayaa commented 4 years ago

Hi,

I will look into the RELAX-AIC-c scores and update you.

I was going through the BUSTED-SR version.

The rate distribution for site-to-site synonymous rate variation was inferred for background branches. This is clearly stated, but the following table with site-to-site synonymous rate variation is not explicitly stated for which branches, I assume it is for the test branches. Can you please clarify this for me?

Thanks for your support.

regards Kavitha

spond commented 4 years ago

Dear @ Kavitha,

The "unnamed" distribution of SRV is for the test branches.

Best, Sergei

kavithamallayaa commented 4 years ago

Dear Sergei,

Will the SRV based on BUSTED differ from the SRV of BUSTEC. If SRV is going to differ then I am much interested in BUSTEC based SRV, as my study actually focuses on estimating efficiency of mainly purifying selection.

regards Kavitha

On Mon, Aug 17, 2020 at 6:18 AM Sergei Pond notifications@github.com wrote:

Dear @ Kavitha,

The "unnamed" distribution of SRV is for the test branches.

Best, Sergei

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/veg/hyphy-analyses/issues/10#issuecomment-674572830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMANSDQI6GPLLYX3C2YC2ZDSBA5IXANCNFSM4PXI4KYA .

spond commented 4 years ago

Dear @kavithamallayaa,

Ideally, the SRV should be influenced by the dN/dS distribution only weakly. Of course the easiest way to check is just to run the analyses.

Best, Sergei

kavithamallayaa commented 4 years ago

Dear Sergei,

I am more happy to run both analyses, Do we have something similar to BUSTED-SR in BUSTEC, so I can parallely run both of these analyses?

I am more focused on SRV based on unconstrained models and I hope it is not going to change much between BUSTED and BUSTEC.

regards Kavitha

On Tue, Aug 25, 2020 at 8:36 AM Sergei Pond notifications@github.com wrote:

Dear @kavithamallayaa https://github.com/kavithamallayaa,

Ideally, the SRV should be influenced by the dN/dS distribution only weakly. Of course the easiest way to check is just to run the analyses.

Best, Sergei

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/veg/hyphy-analyses/issues/10#issuecomment-679400869, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMANSDV44GI4VZVS5OI5EQTSCLTOBANCNFSM4PXI4KYA .