Closed francicco closed 6 years ago
Dear @francicco,
I would not necessarily expect anything specific for the sister species; did you use the rest of the tree as the reference
set of branches? Remember that K
is always defined relative to something, and the choice of reference is very important.
Also, keep in mind that all K
are inferred with (sometimes significant errors) whose magnitude is not independent of the value of K
. When doing kernel density estimation, consider using something that accounts for such error (e.g. https://arxiv.org/pdf/0805.2216.pdf, which I am not sure is directly applicable here).
Best, Sergei
Hi @spond,
Yes, as reference
I used the rest of the tree.
And this is the K
behaviour is a specific set of genes
A Wilcoxon signed-rank test shows a significance distribution between Talp
and the other species.
My interpretation would be a stronger relaxation in the protein coding genes of this species. Does it make sense?
Best Francesco
Dear @francicco,
Based on your plot one could expect differences between Talp
and the outgrip species. So, yes, the outgroup should support a distribution with more density near K = 1.
Although, of course you are know doing "data-driven" hypothesis testing (multiple testing on the same data), which will create bias to discover positive results.
Best, Sergei
Although, of course you are know doing "data-driven" hypothesis testing (multiple testing on the same data), which will create bias to discover positive results.
Yes, I'm aware of that, but I will not discuss which gene is in relaxation in the other species, but just that in 'Talp' the purifying selection is smaller, and relaxation of a higher magnitude. Correct? F
Dear @francicco,
Yeah, so long as you report that you did aBSREL (which prompted you to), then did RELAX, it's OK. Looks like the signal is quite strong and could survive multiple hypothesis testing corrections.
Best, Sergei
Ok! Thank you @spond
Best, Francesco
Dear @spond, one last clarification.
This are still a bit previsional, yet with a quite good signal. This is the 2D density plot of 3 species
Clearly visible the different behaviour of Talp, which also shows a (I don't know how to call it) deviation along K=1. Those, as you can see here,
are Talp branches with a significant sign of selection (aBSREL, corrected for multiple test using Benjamini, Hochberg, and Yekutieli method).
How those branches are along K=1 and not in K>1 as expected? Because of the K error you were mentioning?
Dear @francicco,
What is the X axis in this plot? Is it the mean dN/dS inferred under aBSREL?
Best, Sergei
yes
Dear @francicco,
This is interesting.
In our RELAX paper we talk about how intensification of positive selection might look like relaxation of selection, as measured by dN/dS. So it is possible to have K < 1 while still increasing dN/dS > 1. It really depends on how much positive selection signal there is in the rest of the tree.
On the Y-axis plot the mean branch dN/dS for talp
inferred under the RELAX model. You can compute it from the ω distribution and the K
value. I would be curious to see if there is good correspondence between the two.
What proportion of sites is assigned to ω > 1 by aBSREL?
For talp
, does aBSREL allocate a lot of weight to low ω values?
What does the ω distribution look like under RELAX?
Best, Sergei
Ok, here we got:
- On the Y-axis plot the mean branch dN/dS for
Talp
inferred under the RELAX model. You can compute it from the ω distribution and theK
value.
How do I find it? It looks like it's not in the output log
- What proportion of sites is assigned to
ω
> 1 by aBSREL?
This is how WtOmegaOver1 looks along terminal branches:
And this is correlated with branch length:
- For
Talp
, does aBSREL allocate a lot of weight to low ω values?
A lots of point has WtOmegaOver1 = 0.
- What does the ω distribution look like under RELAX?
I don't know how to answer this.
best, F
Hi @spond,
- In our RELAX paper we talk about how intensification of positive selection might look like relaxation of selection, as measured by
dN/dS
. So it is possible to haveK
< 1 while still increasingdN/dS
> 1. It really depends on how much positive selection signal there is in the rest of the tree.
I think I understand this point. I'll try to reframe it to see if I understand it. If a locus has strong positive selection in the phylogeny, but less in the testing branch, this branch will show positive selection, but its proportion of ω > 1 will look smaller compared with the rest of the phylogeny. If that is true, it makes totally sense, and should means that if I take allo loci under selection in Talp
with K
< 1, they should have a smaller WtOmegaOver1 compared to other branches. And on the other side if I take loci under selection in Talp
with K
> 1, these should be under positive selection in a more species specific manner. Is that right?
Best, F
Dear @francicco,
(and 4) To compute the RELAX distribution, read the JSON file output by the analysis and extract the distribution from the model fits object (http://hyphy.org/resources/json-fields.pdf). I am curious to see what proportion of weight is attributed to very low ω values.
Interesting; it seems that talp
experience very strong selection over a small subset of sites. This means that the majority of the lineage is still conserved (which would tie back to point 0 above)
I think I understand this point. I'll try to reframe it to see if I understand it. ...
Imagine that on most branches in the tree you have strong conservation, say
ω | weight |
---|---|
0.01 | 90% |
0.25 | 9 % |
1.5 | 1 % |
Now say, you have positive selection along a small subset of sites along specific test lineage where the distribution looks like
ω | weight |
---|---|
0.1 | 70% |
0.8 | 25 % |
3 | 5 % |
Under the RELAX model, the test branch obtains its &omegas; from the background by raising them to power K, so for K = 2, you will have the test branch looking like (assuming the rest of the tree dominates the likelihood function, so changes to baseline ω are negligible).
ω | weight |
---|---|
0.0001 | 90% |
0.0625 | 9 % |
2.25 | 1 % |
and for K = 0.75:
ω | weight |
---|---|
0.1 | 90% |
0.5 | 9 % |
1.22 | 1 % |
As you can see, for K = 0.75 the transformed distribution more closely matches the true (but disallowed by the model) desired distribution than for K = 2.
Best, Sergei
Dear @spond
I think I understood the example, but I'm confused about what do you want me to plot. Would you explain it to me again, sorry.
Best, Francesco
Ok @spond,
let me give another try. This are the distribution of the proportion of "purifying selection" (as the lowest rate of omega, among the 3 rates - low, medium and high omega rates) between loci under significant positive selection with K
< 1 (right skewed density plot) Vs K
> 1 (left density plot).
According to the Wilcoxon Rank-Sum test the two distributions are significantly different (P = 0.01746). That should prove that K
is less then one because the stronger purifying selection?
Am I misinterpreting things?
Dear @spond,
Sorry to keep commenting on this topic. I'm trying to publish the results of this study (here), and I found a referee quite hostile in the way I interpret our results. To summarise. I assembled and analysed an alpine (T alpestre) and and 4 other closely related species. What I found was a significant skew in the distribution of both ω (Figure a) and K (Figure b), with the most related species (T. immigrans) with an intermediate and bimodal distribution of K. Without going too much into details, I interpret these results in this way:
The two evolutionary trajectories were predicted in our hypothesis, and while diversifying selection is somewhat expected in species adapting to a new environmental niche, the magnitude of relaxed selection found is more surprising and never recorded. Although both ω and k distributions in T. alpestre are highly skewed, the absolute number of genes under diversifying and relaxed selection are not markedly greater than other species in similar studies (Cicconardi, Marcatili, et al., 2017; Harpur et al., 2014; Roux et al., 2014). This cannot be directly ascribed to the type I error rate because the same rate should be equally randomly present in all the other short or long branches of the phylogeny. Nevertheless, T. alpestre, as part of a species complex (Steiner et al., 2010; H. C. Wagner et al., 2017), may have more recently diverged from more closely related species not yet adapted to the alpine habitat. Magnitude of this relaxed selection is evidenced by the 70 enriched biological processes, and can be seen as the consequence of a shift and/or decreased magnitude in the purifying selection.
The referee commented my consideration in this way:
by studying a single species thriving in the extreme habitat, it is impossible to infer that the pattern of genome-wide relaxed selection is due to adaptation to this habitat, and not, for instance, to a bottleneck that occurred for any other reason; there is actually no a priori reason to expect genome-wide related selection, outside of a few genes not needed anymore in the new habitat
How can I reply? He/She actually right? I don't really think a bottleneck can really give this king of signal.
Thanks a lot, Best F
Hi everyone,
I have a question concerning a study I'm doing on an alpine ant species. I'm comparing it with 21 other species looking for selection in more than 3000 single-copy. I used the aBSREL method for all branches, because I wanted to see the behaviour of the Mean_dNdS between my species and the others. Once Mean_dNdS was plotted:
I noticed significant shifted distribution in my target species (Talp). I'm interpreting this as many genes are moving away from a general purifying selection, maybe moving, at least, towards the neutrality. I would like to validate this hypothesis by running the RELAX test on my target species, and I found this distribution of K:
It looks something like a bimodal distribution, in which a group of genes (the ones with K<0.5) are gradually shifting towards the neutrality. Now I would like to run this test also on its sister species (~5Mya apart), to confront their distribution. I should expect to find a distribution with more centered around 1. Do you guys think my interpretation make sense?
Thank a lot in advance. Regards Francesco