veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
211 stars 69 forks source link

Hi! Im confused about some usages; How to test whether a branch species has been selected? #1632

Closed kuangzhuoran closed 11 months ago

kuangzhuoran commented 1 year ago

I want to check the positive selection of an entire branch。

I have tried 'hyphy absrel' with parameters: "--branches", It seems ‘--branches’ just specify which species I need to count and the rest will get a Null value. So I took the intersection of the target species, but there was no positive selection gene。

Then I want to try BUSTED, but in documentations, I saw : " Importantly, a significant result does not mean that the gene evolved under positive selection along the entire foreground."

spond commented 1 year ago

Dear @kuangzhuoran,

I am sorry, I don't understand your question. Can you please clarify exactly what you would like to test?

I am guessing you might want to see if the mean dN/dS (ω) along a particular branch is >1?

Best, Sergei

kuangzhuoran commented 1 year ago

Yes... Or this gene is under positve selected in all species of a particular branch I have check the output of 'hypy absrel', maybe I need to notice the Node of a particular branch?

for examples: (((((sppA, sppB)Node1, (sppC, sppD)Node2)Node3, sppE)Node4, sppF)Node5, sppG)

If I want to study species sppA, sppB, sppC and sppD If I want to study species sppA, sppB, sppC and sppD (the mean dN/dS (ω) along this particular branch is > 1) so I need to set the 'Foreground' to Node3 ? (Both hyphy absrel and hyphy busted are set up this way)

kuangzhuoran commented 1 year ago

Sorry, I mean I only need to check the result of Node3

spond commented 1 year ago

Dear @kuangzhuoran,

I see two questions here.

How to test if the mean dN/dS > 1 along a specific branch?

aBSREL and other models in HyPhy like BUSTED will test whether or not a fraction of sites at a branch has dN/dS>1. This is because instances where the entire branch is under selection (or average dN/dS > 1) are very very rare, so you are likely to miss signatures of selection that are less extreme. I am sure there might be specific reasons why you'd like to run this test, but please be aware that it will have (very) low power in general.

You can run this test with FitMG94, using something like

hyphy FitMG94.bf --alignment file.fas --type local --lrt Yes

2). How to frame the question about where the clade selection takes place?

In your example

image

(a). If you test for selection ONLY on Node3, this is asking a question about what happened on the branch basal to the entire clade or "Did selection operate during the emergence of the clade?". The presumption would be that something happened that all the descendants shared (e.g. new phenotype due to migration etc), and maintained.

image

(b). If you test for selection on all of the branches in the clade, then the question might be "Did selection operate on the clade during its entire existence?". A biological imperative might be something that exerts selection continuously (e.g. a host-specific pathogen and immune genes).

Best, Sergei

kuangzhuoran commented 1 year ago

Thank you for your reply. In my case, sppA, sppB, sppC and sppD are sister species and have the same characteristics such as all being adapted to low oxygen environments. I was hoping to find some selective signatures that they have in common. Suddenly I don't know which module to choose for my analysis

I have tried FitMG94.bf. “mean dN/dS > 1 along a specific branch”, I think it'll work fine.

And I understand what you're saying about Node3. I need to test for selection on all of the branches in the clade, Because my study species live in the same environment and face the same environmental pressures from a common ancestor all the way to the present day

As you say, "This is because instances where the entire branch is under selection (or average dN/dS > 1) are very very rare". The actual results are the same. Almost no genes, under positive selection on sppA, sppB, sppC and sppD simultaneously (P<0.05 && w>1).

spond commented 1 year ago

Dear @kuangzhuoran,

You may also may want to look at https://github.com/veg/hyphy-analyses/tree/master/BUSTED-PH

Best, Sergei

kuangzhuoran commented 1 year ago

Thanks for your reply I have readed "https://github.com/veg/hyphy-analyses/tree/master/BUSTED-PH"

I think I need to find the gene for which test2 is significant. (A constrained model is fitted, where ω ≤ 1 is enforced on the test branches.)

In order to distinguish my study species from the rest, the next step is to find the genes where test3 is not significant (A constrained model is fitted, where ω ≤ 1 is enforced on the background branches). I think the results at this point are indicative of adaptation

Finally, in order to obtain a statistically significant result, the test4 significant genes are finally searched for (A constrained model is fitted, where the ω distribution is the same for test and background branches)

spond commented 1 year ago

Dear @kuangzhuoran,

Yes, that is correct.

  1. Selection on foreground. ✓
  2. Selection on background. ✗
  3. Foreground and background are different. ✓

Best, Sergei

github-actions[bot] commented 11 months ago

Stale issue message