qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

upgrade to adonis2 #328

Open nbokulich opened 2 years ago

nbokulich commented 2 years ago

Improvement Description Upgrade to the vegan function adonis2 to access additional functionality unavailable in adonis1.

To quote @mestaki :

The most important benefit would be that adonis2 allows marginal testing of variables using the by="marginal" parameter and the output gives a partial R-squared for each variable in the model.

Current Behavior Currently adonis uses adonis1 in vegan. This lacks some functionality of adonis2.

To quote @mestaki :

the current QIIME 2 version of adonis can only perform a sequential test of terms (equivalent to by="terms" in adonis2(), meaning the order of the variables in the formula is important with the residuals from the first term being passed on to the second term, then residuals from the 2nd to 3rd and so on. I suspect most users are not aware of this and it is easy to miss an important biological signal in your model if a term of interest is unintentionally shoved at the end of the formula.

Proposed Behavior Swap the functions, but keep the defaults to match the current adonis1 functionality (this should be possible with adonis2 surely).

Also expose marginal testing and other options available in adonis2 (see related open issues about adonis parameters, e.g., #242)

References Raised by @mestaki on the forum

jwdebelius commented 2 years ago

I would love if this refactor (which I'm happy to help with) output an artifact rather than a visualization. The results could then be passed into the current visualizer, but it would make adonis/adonis2 easier to use in the python interface for multiple testing or for comparing models.

Again, happy to help!

ebolyen commented 2 years ago

Hey @jwdebelius, @lizgehret and I have been working on reporting statistical outputs as artifacts (in no small part because of your suggestions at the last workshop). If you'd like to see how we've been doing it, you can find it here: https://github.com/qiime2/q2-fmt , we'd be super interested in any feedback you can give.

We're using Tabular Data Resources as the backing format because it makes the data a bit more standardized for alternate environments and lets us attach metadata to the columns, so we can preserve things like labels and other info (ideally mirroring the convenience of R a little bit).

Maybe these can serve as a template for the Adonis refactor? It may be worth having an offline conversation about what our plans are in the near-term for this stuff. I think it would be really great to have you on-board! (cc @gregcaporaso and @nbokulich)

jwdebelius commented 2 years ago

Thanks @ebolyen,

I will check it out! Thank you. And I'm happy to talk offline and contribute, if I can. I'm not too familiar with the tabular data resource.

mestaki commented 2 years ago

Thanks @nbokulich for starting this and @jwdebelius for offering to help it out! I'm happy to help anyway I can as well but will be nowhere as effective as Justine lol.

Proposed Behavior Swap the functions, but keep the defaults to match the current adonis1 functionality (this should be possible with adonis2 surely).

Yes, totally, there are a few options here. The updated vegan package comes with both adonis() and adonis2(). Option 1: keep default as adonis() to avoid any backward compatibility issues, and add adonis2 as a separate plugin. Option2: use adonis() as default and switch to adonis2() if the user adds the by="margin" parameter. Option 3: Just use adonis2() with by="terms" as the default which is identical to current adonis(), as shown below, and expose the by parameters so users can switch when needed. My vote is Option 3 by far.

with adonis()


library(tidyverse)
library(vegan)
library(broom)
data(dune)
data(dune.env)

set.seed(2022) 
adonis(dune ~ Management+A1, data = dune.env) %>% #build model
.$aov.tab %>% #grab summary table
broom::tidy() #broom's tidy cleans up results in a tibble. The warnings can be ignored.

'adonis' will be deprecated: use 'adonis2' instead
# A tibble: 4 × 7
  term          df SumsOfSqs MeanSqs F.Model    R2 p.value
  <chr>      <dbl>     <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Management     3     1.47    0.490    3.07 0.342   0.004
2 A1             1     0.441   0.441    2.77 0.103   0.023
3 Residuals     15     2.39    0.159   NA    0.556  NA    
4 Total         19     4.30   NA       NA    1      NA    
Warning message:
In tidy.anova(.) :
  The following column names in ANOVA output were not recognized or transformed: SumsOfSqs, MeanSqs, F.Model, R2

with adonis2, by="terms"

set.seed(2022); adonis2(dune ~ Management+A1, data = dune.env, by="terms") %>% 
broom::tidy(.)

# A tibble: 4 × 6
  term          df SumOfSqs    R2 statistic p.value
  <chr>      <dbl>    <dbl> <dbl>     <dbl>   <dbl>
1 Management     3    1.47  0.342      3.07   0.004
2 A1             1    0.441 0.103      2.77   0.023
3 Residual      15    2.39  0.556     NA     NA    
4 Total         19    4.30  1         NA     NA    
Warning message:
In tidy.anova(.) :
  The following column names in ANOVA output were not recognized or transformed: SumOfSqs, R2

And finally with adonis2, by="margin"

set.seed(2022)
adonis2(dune ~ Management+A1, data = dune.env, by="margin") %>% 
broom::tidy(.)

# A tibble: 4 × 6
  term          df SumOfSqs    R2 statistic p.value
  <chr>      <dbl>    <dbl> <dbl>     <dbl>   <dbl>
1 Management     3    1.19  0.276      2.48   0.005
2 A1             1    0.441 0.103      2.77   0.023
3 Residual      15    2.39  0.556     NA     NA    
4 Total         19    4.30  1         NA     NA    
Warning message:
In tidy.anova(.) :
  The following column names in ANOVA output were not recognized or transformed: SumOfSqs, R2

A couple of notes here. The warnings here are related to the tidy function I used. I'm piping the adonis output into broom::tidy() as this gives nice tibbles that are convenient to work with downstream. The broom::tidy is a very useful way to get various stats objects into tables. It isn't usually used with adonis outputs, thus the warnings, though they are not important imo, but I find the output very useful still and it may actually help with Justine's hope of getting this into an artifact format. Adonis2() also doesn't require fetching the aov.tab object so one less thing to worry about.

@ebolyen I have some minor suggestions about those awesome plots over on the q2-fmt page you linked. Should I comment there?

mestaki commented 2 years ago

Looks like I was wrong about adonis2() being introduced to vegan only after 2.5_7 and after, as Jari Oksanen mentioned in the above Q2 forum post, this has been around since 2.4-0, so updating this will be even simpler, just a matter of updating the existing syntax to adonis2 and exposing 1 or 2 additional parameters, which we can discuss here.

colinbrislawn commented 2 years ago

broom::tidy() does not support any vegan objects, but ggvegan does!

Thank you all for working on this problem. I'm excited for q2-fmt

gregcaporaso commented 11 months ago

Other adonis related feature requests that should be addressed at the same time as this one:

303

242

243