vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
449 stars 96 forks source link

Adonis2 interactions(*) formula and F.Model -Inf #432

Closed sarpiens closed 2 years ago

sarpiens commented 3 years ago

Hello,

I have a Aitchison distance matrix (CLR Transformation+Euclidean distance), called temp_d, derived from 16S based data. I have various groups, and I wanted to analyze their variables with a PERMANOVA for feces transplant. The variables shown are host(species from which the samples were obtained), and feces_host (species form which the input feces came from). I have try both default, by term, and by margin, but I need help to understand the results.

When using by margin: adonis_marging

When using including interactions(*) in the model formula, only the interaction term appears and F statistic=-Inf (I don't understand the meaning of this -Inf for the F statistic), but not the individual terms. However when I do it using only interactions(:), or only terms(+), it works, but not when terms and interactions are both included in the model.

When using the default: adonis

When using including interactions(*) in the model formula, only the terms appear, but not the interaction term, being the result identical to use only (+). However when I do it using only interactions(:), or only terms(+), it works, but not when terms and interactions are both included in the model.

Thanks in advance for any help!

jarioksa commented 3 years ago

There are two separate issues here: (1) the meaning of term marginal test, and (2) redundancy of the interaction term.

Marginal tests study the effect of the term when it is removed from the model containing all other terms. In interaction model ~ host + feces_host + host:feces_host the only term that can be removed from the model is host:feces_host. Terms host and feces_host are both included in host:feces_host and hence they cannot be removed. For this reason only the interaction term is studied with by = "margin". See ?drop.scope in base R for further discussion.

The second issue is that the interaction term is redundant. In the first (by = "margin") example you see that when you analyse its effects in a model containing ~host + feces_host, nothing happens: term host:feces_host does not change the model in anyway: The change in degrees of freedom (Df) is 0, the change in sum of squares (SumOfSqs) is 0. F = −∞ just appears as a result of the calculations that we do with zeros – we have no special treatment of Df=0, SS=0 cases. The interaction term has nil effect and therefore you get these anomalies: Only two of host, feces_host and host:feces_host has any effect in your model, and only two will appear in the output. Which two, varies among models. However, they all define the same model, as you see in the Residual and Total rows which are identical in all models.

It seems that both of your terms have 1 degree of freedom. This implies that they have only two levels (two types of hosts and feces_host). Was this your intention or have you forgot to cast these variables to multilevel factors?

sarpiens commented 3 years ago

So If I understand correctly, in the default mode(terms) the interaction host:feces_host does not add anything new to the model and that is why it is removed and only appears host and feces_host.

I was worried because when I saw the example: adonis2(dune ~ Management*A1, data=dune.env, permutations=10000) Terms (Management, A1) and interactions (Management:A1) appeared.

I think that I get it know.

Thank you very much for the quick response!

Ps: Yes I have two types of hosts and feces_host

jarioksa commented 3 years ago

About that F =−∞: The F is based on zero change in sum of squares with zero degrees of freedom, and it will involve term 0/0 so that the result should strictly be NaN (Not a Number). However, the change in sum of squares is calculated within the function and it may not be (and usually is not) strictly zero due to round-off errors. In one test I got it as −6×10-10 and dividing by 0 gives −∞. This is disturbing, but I am not sure if it is so disturbing that I should handle this case separately (if it becomes +∞ and function is unable to handle that cleanly, I may reconsider).

sarpiens commented 3 years ago

Hello, again I have another doubt about interpreting R2 Values when using the byterm or bymargin mode. Following with the above example, imaging that I have "host" and "feces_host" variables and I repeat the adonis2 byterm or bymargin with only one variable. Would the resulting R2 values be comparable?

For example, adonis2(host) -> R2= 0.5 adonis2(feces_host) -> R2=0.4

Would be the the variance that "host" explains by its own > than "feces_host"? Or the correct way to compare them will be using a marginal model with the two variables?