Interpreting BUSTED-PHenotype analysis

francicco commented 2 years ago

Hi,

I'm exploring the BUSTED-PHenotype analysis. This is the result of it. I didn't quite understand the results.

### Obtaining branch lengths and nucleotide substitution biases under the nucleotide GTR model

>kill-zero-lengths –> Yes
* Log(L) = -13403.35, AIC-c = 27012.97 (103 estimated parameters)
* 1 partition. Total tree length by partition (subs/site)  1.640

### Obtaining the global omega estimate based on relative GTR branch lengths and nucleotide substitution biases
* Log(L) = -12132.66, AIC-c = 24488.27 (111 estimated parameters)
* 1 partition. Total tree length by partition (subs/site)  1.812
* non-synonymous/synonymous rate ratio for *background* =   0.1010
* non-synonymous/synonymous rate ratio for *test* =   0.1408

### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model
* Log(L) = -12031.98, AIC-c = 24286.91 (111 estimated parameters)
* non-synonymous/synonymous rate ratio for *background* =   0.0807
* non-synonymous/synonymous rate ratio for *test* =   0.1414

### Performing the full (dN/dS > 1 allowed) branch-site model fit
* Log(L) = -11946.53, AIC-c = 24132.14 (119 estimated parameters)
* For *test* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    0.237    |                                   |
|        Negative selection         |     0.022     |   98.329    |                                   |
|      Diversifying selection       |    13.114     |    1.434    |                                   |

* For *background* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.054     |   98.476    |                                   |
|         Neutral evolution         |     1.000     |    1.226    |                                   |
|      Diversifying selection       |    11.982     |    0.297    |                                   |

### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -11960.15, AIC-c = 24157.36 (118 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   88.119    |                                   |
|        Negative selection         |     0.000     |    0.359    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |   11.522    |                                   |

* For *background* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.054     |   98.439    |                                   |
|         Neutral evolution         |     1.000     |    1.269    |                                   |
|      Diversifying selection       |    12.005     |    0.292    |                                   |

### Performing the constrained background (dN/dS > 1 not allowed on background branches) model fit
* Log(L) = -11961.08, AIC-c = 24159.22 (118 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 on background branches model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    0.213    |                                   |
|        Negative selection         |     0.021     |   98.349    |                                   |
|      Diversifying selection       |    12.968     |    1.438    |                                   |

* For *background* branches under the null (no dN/dS > 1 on background branches model), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.039     |   95.289    |                                   |
|         Neutral evolution         |     1.000     |    3.626    |                                   |
|         Neutral evolution         |     1.000     |    1.085    |       Collapsed rate class        |

### Performing the shared distribution (same on test and background brances) model fit
* Log(L) = -11954.06, AIC-c = 24137.11 (114 estimated parameters)
* For the shared rates model (same between test and background), the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.056     |   98.922    |                                   |
|         Neutral evolution         |     1.000     |    0.675    |                                   |
|      Diversifying selection       |    12.466     |    0.402    |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification and association with phenotype/trait [BUSTED-PH]
Likelihood ratio test for episodic diversifying positive selection on test branches , **p =   0.0000**.
Likelihood ratio test for episodic diversifying positive selection on background branches , **p =   0.0000**.
Likelihood ratio test for differences in distributions between **test** and **background** , **p =   0.0101**.

## Analysis summary (p = 0.05)
Selection is acting on the branches with the phenotype / trait, but is **also** acting on background branches.
There is a significant difference between test and background branches in terms of selective pressure

So, it seems like in both groups BUSTED detected a signature of diversifying selection, although there's also between the two there's a significant difference. I don't understand which values should I look at and in which group the positive selection is higher or lower.

Thanks a lot Francesco

spond commented 2 years ago

Dear @francicco,

Looks like your alignment is subject to selection throughout the entire tree (both test and background branches). If you take point estimates as a rough guide, the test branches have ~1.4% of branch-site combinations evolving with ω ≈ 13, while the background branches have ~0.3% of branch-site combinations evolving with ω ≈ 12. Both of those are significant, statistically (p << 0.001 for both).

However, they are also different from each other, i.e. with p = 0.01 we can reject the hypothesis that ω distributions are the same for background and test branches. BUSTED-PH does not test for directions of change (one is more/less selected) directly, just that they are different. You may use something like the RELAX test to look for such directions (relaxation/intensification).

Does this make more sense?

Best, Sergei

francicco commented 2 years ago

Dear @spond,

Thanks a lot, that makes sense, and yes, I was already thinking to run RELAX to understand the direction! Thanks a lot! F

github-actions[bot] commented 2 years ago

Stale issue message

veg / hyphy

Interpreting BUSTED-PHenotype analysis #1485