vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
449 stars 96 forks source link

What does "method='bray' do when I pass in a distance matrix? I don't need a Bray-Curtis calculation--I already have a distance matrix. #330

Closed abalter closed 2 years ago

abalter commented 5 years ago

Suppose I have a distance matrix created by applying the weighted unifrac measure. For instance, using a phyloseq object:

weighted_unifrac = UniFrac(
  physeq=ps,
  weighted=T,
  normalized=T,
  parallel=T,
  fast=T
)

weighted_unifrac is now a distance matrix. Then I perform PERMANOVA using adonis, and showing the default method parameter:

permanova = adonis(
  formula=weighted_unifrac ~ CaseString, 
  data=sample_data,
  permutations=999,
  method='bray'
  )

What is method doing here? I think of Bray-Curtis as a way to generate the distance matrix itself. How would my choice of another method value affect my PERMANOVA analysis?

jarioksa commented 4 years ago

Have you tried? What did the output say to you?

If you supply a distance structure to metaMDS, it will be used as such and argument method is ignored. This will also be shown in the output: the output has line Distance: which gives information on dissimilarities used. If it says something else than Distance: bray then that something else was used.

The only complication is that you must supply a structure that are dissimilarities or distances like defined in R and can be recognized as such, or a symmetric square matrix. There are some non-conforming R packages that do not supply such results although they say they produce distances or dissimilarities. I have no idea what is the package that you used to find UniFrac distances, neither do I know how their result is structured. If the result are not a legal R distances, you should change them to distances, and they will be recognized as such in metaMDS.

gavinsimpson commented 4 years ago

Where @jarioksa says metaMDS() just read adonis().

From ?adonis we have

method: the name of any method used in vegdist to calculate pairwise distances if the left hand side of the formula was a data frame or a matrix.

So for adonis() the documenation says you must pass it an actual "dist" object. So check what UniFrac() returns. If it returns a square symmetric matrix then you'll need to convert that to a "dist" classed object with as.dist().

If you pass it a "dist" classed object then method does nothing and thus won't affect anything. But to be sure you need to know what UniFrac() gives you and then follow what adonis() expects to be given if you want to pass it dissimilarities directly.

jarioksa commented 4 years ago

Actually, both adonis (+ adonis2) and metaMDS have similar test for input: if the input inherits from "dist" or input can be seen as a symmetric matrix, it is used as such and no dissimilarities are calculated. If, for instance, you have dissimilarities in the lower triangle of a square matrix, it is non-symmetric and will not be regarded as a dissimilarity matrix but will be used as a raw data matrix. Usually the easiest way is to use first as.dist on your input, and if it works correctly, you can use its result safely in metaMDS, adonis etc.

Here is the test we use in in metaMDS (from metaMDSdist.R), and the tests are essentially similar in other functions which assume "dist" input:

    ## metaMDSdist should get a raw data matrix, but if it gets a
    ## 'dist' object return that unchanged and quit silently.
    if (inherits(comm, "dist")  ||
        ((is.matrix(comm) || is.data.frame(comm)) &&
             isSymmetric(unname(as.matrix(comm)))))
        return(comm)
gavinsimpson commented 4 years ago

I think we need to change the documentation then as ?adonis at least implies that a dissimilarity matrix must be supplied as a "dist" class object:

\item{formula}{Model formula. The LHS must be either a community
data matrix or a dissimilarity matrix, e.g., from
\code{\link{vegdist}} or \code{\link{dist}}.  If the LHS is a data
matrix, function \code{\link{vegdist}} will be used to find the
dissimilarities. The RHS defines the independent variables. These
can be continuous variables or factors, they can be transformed
within the formula, and they can have interactions as in a typical
\code{\link{formula}}.}

but the code does check, in the case of a matrix being passed, if said matrix is symmetric and if it is it coerces to the required dist class.

jarioksa commented 4 years ago

@gavinsimpson We have similar handling of input dissimilarities in adonis, anosim, bioenv, capscale, dbrda, metaMDS, monoMDS, mrpp, pcnm and spantree. I once (years ago) went through these functions for consistent handling of dissimilarities, but I am not sure if I changed the documentation accordingly.

gavinsimpson commented 4 years ago

@jarioksa OK; I'll take a look later this week and see if any of the .Rd files need updating, as well as modifying the one for adonis