veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
201 stars 68 forks source link

RELAX: determining if a shift in selection intensity is driven more by changes in positive or purifying selection #1595

Closed mbarkdull closed 1 year ago

mbarkdull commented 1 year ago

Good afternoon,

I have a question about how to determine if a shift in selection intensity is being driven by changes in positive selection, purifying selection, or both, using the ω values that are returned by RELAX.

For my current project, a review has asked if I looked at "the changes of the three ω classes [in the RELAX results] to have an understanding whether the [observed shifts in selection intensity] are happening mostly to the sites under purifying selection or to the sites under positive selection".

I am hopeful you can comment on how best to approach this point. My understanding is that RELAX infers three ω classes for the reference branches, and determines the proportion of sites evolving under each class. Currently, my thought is that to address the reviewer's point, there are three things I could consider:

  1. In some genes, the proportion of sites evolving under each ω class is the same in the reference and test branches, but the actual ω values vary.
    • In this case, for each ω class, I could calculate the shift in ω value between the reference and test branch sets.
  2. In some genes, inferred ω values are the same for the reference and test branches, but the proportion of sites in each class varies.
    • In this case, for each ω class, I could calculate the shift in proportion of sites evolving under that ω value.
  3. But, in some genes, both the ω values for each class and the proportion of sites in each ω class vary between reference and test.
    • So I would somehow need to capture a combination of those two factors (not sure how I would do this).

In essence, do you have any suggestions for how to determine if the shift in selection intensity is being driven more strongly by changes in the strength of purifying selection vs. changes in the strength of positive selection? Or perhaps this is not possible/practical.

I am happy to share my RELAX results if that is useful.

Thank you so much for your feedback, and for developing these analyses! Megan

spond commented 1 year ago

Dear @mbarkdull,

Your typical RELAX analysis will infer 3 ω values for the background branches (with corresponding weights) and map them to ωK for reference branches. So, unless ω = 0 or ω = 1 or K = 1, the entire distribution of values will change. Now assuming that K ≠ 1 with a significant p-value, RELAX does not tell you at all which of the ω components contribute to the significant value. You know that some are, but you don't know if it's mostly explained by changes for ω < 1 or for ω > 1 or both.

You can run one of the descriptive models in RELAX to see how the distributions differ between background and background under less restrictive mappings, but these models do not perform a formal test of relaxation/intensification (hence "descriptive"). Sounds like you may have used those. Can you provide me with an example or two of your RELAX fits?

If you want to more formally test for which of the selective regimes (ω < 1 or ω > 1) is more responsible for the shifts, in the hypothesis testing framework you could modify RELAX to separately test for relaxation/intensification for ω < 1 and ω > 1. This would require refitting the data with such "disaggregated models".

The changes are not that hard to make, and I could probably provide a version of RELAX that does that sometime next week.

Best, Sergei

mbarkdull commented 1 year ago

Dear @spond,

Thank you for that description of what RELAX is doing- that is helpful. My initial comment was based on looking at the six omega classes + corresponding proportions provided for the "RELAX alternative" model.

I will email you two of my RELAX .json outputs.

Since it does not appear that my reviewer's question can be satisfactorily answered with my existing results, and since I ran RELAX on tens of thousands of genes, I think I can just respond to them that this is not practical for my project. However, I really appreciate your offer to release a modified version of RELAX! I suppose it might be useful to others in the future.

Thank you for taking the time to answer this question, and all the other Github issues I've opened here over the year! Megan