Closed liamfriar closed 7 months ago
Dear @liamfriar,
Thank you for your question. While RELAX does calculate ω values specific to each branch, they aren't readily available in standard output files. Only the k
parameters are. To obtain these raw ω values would require modifications to the RELAX.bf
script.
An alternative approach would be to use aBSREL
, which does output inferred rate distributions. However, this might not align with what is inferred in RELAX due to how the branch sets are structured and the additional k
parameter in RELAX.
Best, Steven
Hi Steven,
Thank you for your response. Do you think aBSREL
would be helpful for the following?
I have ~10 ingroup genomes and ~40 outgroup genomes. I expect the ingroup genomes to be experiencing relaxed selection on many genes. To that end, I have run RELAX
on about 2000 single copy gene trees across the 50 genomes. That worked great. I am also curious about whether certain metabolic pathways or cellular functions are under more or less relaxed selective pressures. One way to approach this would be to just look at the distribution of "relaxation" vs. "intensification" vs. "not_significant" results from RELAX
in the different pathways, but I think that could be confusing given confidence p-values for each individual result. I think it would be interesting to instead get the dN/dS
values for each branch from each gene in each pathway and plot these as distributions for each pathway.
So, for instance: if pathway X has geneA, geneB, geneC, then get all the dN/dS
values for each of geneA,B,C and plot these all together for ingroup and outgroup and see if there is a significant (probably T-test) difference between the ingroup and outgroup distributions. I believe there will be very few cases where dN/dS > 1
, so if those are rare, I will throw them out to make interpretation of the results more straightforward.
It looks like aBSREL
gives a distribution of omega
values for each (#1198), so I am not sure how I would parse those results? I could limit the model to 1 omega
value per branch?
Any and all thoughts/suggestions would be hugely helpful. hyphy RELAX
was great to run and super useful.
It looks as though FitMG94
might be the easiest way to get just an omega
value for each branch in a gene tree?
Alternatively, from the RELAX
results, is this the mean omega
values for each group of genes?
### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model
* Log(L) = -8145.95, AIC-c = 16505.96 (106 estimated parameters)
* non-synonymous/synonymous rate ratio for *Reference* = 0.1185
* non-synonymous/synonymous rate ratio for *Test* = 0.1669
* non-synonymous/synonymous rate ratio for *Unclassified* = 0.2065
Dear @liamfriar,
They are omega
values for each branch set defined, but they are not the mean
, they are computed by maximum likelihood.
For full details of RELAX values in the JSON output, we have a manual here if that helps at all.
Best, Steven
Dear @liamfriar,
Base on your description, I would do the following
FitMG94
with the local options, like so (updaing the paths as needed)hyphy FitMG94.bf --alignment CD2.nex --type local
...
### Estimating confidence intervals for dN/dS along each branch
| Branch | Length | dN/dS |Approximate dN/dS CI|
|:----------------------------:|:--------------:|:--------------:|:------------------:|
| PIG | 0.192 | 1.345 | 0.966 - 1.809 |
| COW | 0.253 | 1.918 | 1.456 - 2.479 |
| Node5 | 0.103 | 1.482 | 0.836 - 2.304 |
| HORSE | 0.209 | 1.245 | 0.926 - 1.635 |
| CAT | 0.277 | 1.604 | 1.241 - 2.038 |
| Node4 | 0.066 | 0.664 | 0.275 - 1.202 |
| RHMONKEY | 0.004 |10000000000.0...|0.000 - 10000.000...|
| BABOON | 0.002 | 0.000 | 0.000 - 0.686 |
| Node11 | 0.026 | 0.401 | 0.139 - 0.817 |
| HUMAN | 0.000 | 1.000 |0.000 - 10000.000...|
| CHIMP | 0.002 |10000000000.0...|0.000 - 10000.000...|
| Node14 | 0.018 | 0.368 | 0.052 - 0.924 |
| Node10 | 0.110 | 1.915 | 1.270 - 2.728 |
| Node3 | 0.290 | 0.432 | 0.317 - 0.573 |
| RAT | 0.066 | 1.089 | 0.610 - 1.717 |
| MOUSE | 0.122 | 0.525 | 0.343 - 0.755 |
Confidence Intervals
object, from which you want the MLE. dS
and dN
are computed as explained in section 1.4.3
of http://hyphy.org/resources/hyphybook2007.pdf, and their ratio will be close to but not identical to the MLE estimate of ωWith jq
you can extract all the values like so:
jq -r '["Branch", "dN/dS"], (.["branch attributes"]["0"] | to_entries | map ([.key,.value["Confidence Intervals"].MLE])[]) | @csv ' CD2.nex.FITTER.json
...
Branch","dN/dS"
"BABOON",0
"CAT",1.60381523178166
"CHIMP",5910.638940970648
"COW",1.918003612281158
"HORSE",1.24537295438726
"HUMAN",0
"MOUSE",0.5249419169958035
"Node10",1.914890634109143
"Node11",0.4008445971927094
"Node14",0.368278547471397
"Node3",0.4318011219571806
"Node4",0.6637552081371882
"Node5",1.482128238110369
"PIG",1.344721109212511
"RAT",1.08887083731336
"RHMONKEY",10000
Best, Sergei
@spond 's last suggestion appears to have worked great. And thank you for the jq
one-liner! This issue #898 was also helpful for me to run long sequences of concatenated genes.
Dear @liamfriar,
My pleasure. Glad this was helpful.
Best, Sergei
Hi,
I am wondering if there is a way to get the dN/dS values calculated along each branch from the RELAX outputs? Or do I have to run a different program to get those raw numbers?
Thank you!