vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
449 stars 96 forks source link

What is a "permutation" in PERMANOVA within adonis? #448

Closed Coldgrad closed 2 years ago

Coldgrad commented 2 years ago

I'm using PERMANOVA to calculate significant differences between groups in my beta-diversity analysis for 16S rRNA gene sequencing. However, while I understand the results, I don't understand what the statistical analysis does.

I'm using the adonis function as follows:

library(vegan)
bray<- distance(physeq_shime3, method='bray', type='samples')
adonis(formula = bray ~ sample_data(physeq_shime3)$Treatment, permutations = 10000)

Permutation: free
Number of permutations: 10000

Terms added sequentially (first to last)

                                     Df SumsOfSqs  MeanSqs F.Model      R2 Pr(>F)
sample_data(physeq_shime3)$Treatment  1   0.05315 0.053155  1.2739 0.04352 0.2507
Residuals                            28   1.16834 0.041727         0.95648       
Total                                29   1.22150                  1.00000

My understanding, as far as beta-diversity analysis goes, is that the distances are calculated from the centroid of each group to the group's data points. What then? And how do permutations come into play? I've seen similar posts on this forum and others, but they don't quite answer what is being "permuted".

Thanks in advance.

gavinsimpson commented 2 years ago

I have now answered you here

You are doing the wrong analysis if you are trying to do something on beta diversity. For that you want betadisper().

In short, we permute the rows of bray (but could just as easily permute the elements of sample_data(physeq_shime3)$Treatment). This is equivalent to randomly assigning observations to groups. If there is any effect of your Treatment, then when we run adonis() on the original data we should yield a larger value of the test statistic F.model than when we run adonis() with the data assigned to groups at random. To get a p value, i.e. probability of achieving the observed value of F.model under the null hypothesis of no effect of Treatment, we have to generate the distribution of the test statistic under the null hypothesis and we do that by computing the test statistic for many reorderings — permutations — of the data. The p value here is essentially how many values of F.model from the 10000 permutations you did plus 1 (for the observed ordering) were as large as or larger than the observed value of 1.2739.

In your case about 2506 of the 10000 values of F.model that we recorded when randomly assigning your data to groups were as large as or larger than 1.2739. Hence the observed result is consistent with the null hypothesis of no effect of treatment on the response. Put another way, the evidence against the null hypothesis of no effect is quite low.

We're basically doing a multivariate ANOVA but we don't believe that the pesudo F statistic we compute has the properties of a real F statistic and hence we can't assume that the sampling distribution for that statistic under the null hypothesis is an F distribution with (in this case) 1 and 28 degrees of freedom. So we generate the sampling distribution under the null hypothesis via a permutation test instead.

If this still doesn't help, you should do some reading on randomisation and permutation tests. We do the same procedure in many functions in {vegan} including in betadisper(), the only thing that varies is the test statistic used.

Assumptions wise, the main assumption here is that the observations under the null hypothesis can be permuted at random. If the samples can't be permuted at random then this is not a valid test. In that case you might be able to constrain the permutation test to disallow some reorderings of the data that shouldn't be allowed. For example, it your data were repeated observations on a number of subjects, then you can't swap observations between subjects. So we have to restrict the reordering of the data to happen within subjects. This kind of restricted permutation can be done using the tools in the {permute} package, which {vegan} uses behind the scenes to create the permutations of the data used in these tests.