veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

Contrast-Fel and Comparing Multiple Groups #1674

Closed evandiego83 closed 6 months ago

evandiego83 commented 6 months ago

Hello Sergei and hyphy colleagues,

Thank you for such a great piece of software and for continually improve it.

I wish to compare the selective pressures between four different viral genotypes and wanted to use contrast-fel to test among these different branches comprising all 4 groups. Is contrast-fel restricted to simply comparing two groups or is there a way to automatically do the pairwise comparisons of all four different groups in contrast-fel? If so how would this be done?

Also what would be the best solution to estimate dn/ds for internal vs. external branches? Would it best to select each of these as test and reference branches and run FEL or some other method?

Many thanks in adavance.

Evan

spond commented 6 months ago

Dear @evandiego83,

1). If you specify more than 2 groups, contrast-fel runs 1.1 All pairwise tests 1.2 An "omnibus test", i.e. a test with β1 = β1 = ... = βN as the null vs the alternative where all β are independently estimated.

For example, using the data I attach in the example, you can run

hyphy contrast-fel --alignment COX3hyPhy.fasta --tree Tree-annotated.nwk --code Invertebrate-mtDNA --branch-set birds --branch-set mammals --branch-set Leucocytozoon --branch-set Haemoproteidae

which will report the following

### **7** tests will be performed at each site
...
| Codon  |     alpha      |             beta             |        substitutions         |                  test                  |LRT p-value|Permutation p-value|
|:------:|:--------------:|:----------------------------:|:----------------------------:|:--------------------------------------:|:---------:|:-----------------:|
|   6    |        1.549   |        0.000 -      0.875    |          3, 3, 5, 5          |                overall                 |  0.0147   |      0.5000       |
|   6    |        1.549   |        0.590 :      0.584    |             3, 3             |        Haemoproteidae vs birds         |  0.0395   |      0.5000       |
|   6    |        1.549   |        0.584 :      0.095    |             3, 5             |         birds vs Leucocytozoon         |  0.0052   |      0.5000       |
|   6    |        1.549   |        0.000 :      0.095    |             5, 5             |        mammals vs Leucocytozoon        |  0.0478   |      0.5000       |

2) "Internal" and "Terminal" branches are built-in sets.

You can just do something like (using this dataset)

hyphy contrast-fel --alignment tests/data/HIVvif.nex  --branch-set "Internal branches" --branch-set "Terminal branches"

...

### Improving branch lengths, nucleotide substitution biases, and global dN/dS ratios under a full codon model
* Log(L) = -3487.10
* non-synonymous/synonymous rate ratio for *internal* =   0.5103
* non-synonymous/synonymous rate ratio for *leaf* =   0.8196

### For partition 1 these sites are significant at p <=0.05

### For partition 1 these sites are significant at p <=0.05

| Codon  |     alpha      |             beta             |        substitutions         |                  test                  |LRT p-value|Permutation p-value|
|:------:|:--------------:|:----------------------------:|:----------------------------:|:--------------------------------------:|:---------:|:-----------------:|
|   31   |        1.346   |        6.081 :      0.000    |             7, 1             |            leaf vs internal            |  0.0257   |      1.0000       |
|   33   |        0.772   |        1.279 :     13.980    |             3, 6             |            leaf vs internal            |  0.0053   |      1.0000       |
|   92   |        1.860   |        6.041 :      0.000    |            10, 1             |            leaf vs internal            |  0.0359   |      1.0000       |
|  109   |        0.000   |        6.744 :      0.000    |             4, 0             |            leaf vs internal            |  0.0382   |      1.0000       |
|  192   |        1.282   |        0.000 :      2.773    |             0, 1             |            leaf vs internal            |  0.0166   |      0.3333       |

### ** Found _5_ sites with different _leaf vs internal_ dN/dS at p <= 0.05**

### ### False discovery rate correction
There are no sites where the overall p-value passes the False Discovery Rate threshold of 0.2

Best, Sergei

Archive.zip

evandiego83 commented 6 months ago

Thanks @spond for this qucik response. This seems to be running now with those commands!

However, If I have only 1 unlabeled branch and all the remaining branches belong to each of the genotypes does that affect any of the analysis in terms of having only 1 branch in the background? Would it reduce any statistical power? Hope that makes sense.

Many thanks! Evan

spond commented 6 months ago

Dear @evandiego83,

Unlabeled branches (for contrast-fel) are "nuisance"; they won't contribute/detract much from the analyses. Power comes from having more branches/data in the test groups.

Best, Sergei

evandiego83 commented 6 months ago

Thanks!