vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
442 stars 96 forks source link

need larger value of 'ncol' as pivoting occurred #452

Open ZhangDengwei opened 2 years ago

ZhangDengwei commented 2 years ago

Hi,

I used adonis2 with the following code

adonis2(df_pathway_f~., bgc_exp_f, permutations = 999, distance = 'bray', by = "margin")

But I got the error need larger value of 'ncol' as pivoting occurred

I understand this is due to too large size of bgc_exp_f with ~160 columns, could you have any clues on how to fix it. Thanks in advance!

jarioksa commented 2 years ago

I cannot guess: the error message is not from vegan, but from an external function that vegan calls (and I don't know which function). Please issue traceback() after getting the error to see where this situation occurs.

gavinsimpson commented 2 years ago

Looks like the error comes from qr.X() https://rdrr.io/github/robertzk/monadicbase/src/R/qr.R

Note sure that helps us much as that gets called in a few placed that would be hit by the call to adonis2()

jarioksa commented 2 years ago

It indeed comes from qr.X function that we do not call from adonis2 nor from anova.ccabymargin (it is called from anova.ccabyaxis, but that was not called here). Without proper debugging information this looks too tedious to track. We would need a reproducible example: something we can run and get the error. If that cannot be provided, at least a traceback() to tell us how to reach this error. If we are unable to get the error or even see where it comes from, nothing can done.

I expand a bit: we do not call qr.X() from vegan, but obviously some function that we call from vegan calls an external function that calls a function that ... calls a function that calls qr.X. The best we can do is to find the point where the control moves from vegan to those other functions, and to see if there is something we can do before getting to that event horizon. However, we need a reproducible example.

Rob-murphys commented 1 year ago

Not meaning to necro this but it only seems to occur when using ´by="margin"` with interactions specified in the formula. For example:

# This will give the error
> adonis2(bray_dismat.norm.no_na ~ caste_phase * species * genus, by = "margin", permutations = 999, data = comb_metadata.sorted.no_na)
Error in qr.X(object$CCA$QR) : 
  need larger value of 'ncol' as pivoting occurred

# This will not give the error
> adonis2(bray_dismat.norm.no_na ~ caste_phase + species + genus, by = "margin", permutations = 999, data = comb_metadata.sorted.no_na)
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

adonis2(formula = bray_dismat.norm.no_na ~ caste_phase + species + genus, data = comb_metadata.sorted.no_na, permutations = 999, by = "margin")
             Df SumOfSqs      R2      F Pr(>F)    
caste_phase   3    4.402 0.05502 3.8621  0.001 ***
species       4    6.161 0.07701 4.0540  0.001 ***
genus         0    0.000 0.00000   -Inf           
Residual    156   59.271 0.74081                  
Total       167   80.009 1.00000                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
jarioksa commented 1 year ago

No reproducible example. Nothing can be done.

Rob-murphys commented 1 year ago

Apologies I was not trying to seek help just state my experience. I can give data for a reproducible example tomorrow but for what it is worth it seems to occur for any large dataset.

jarioksa commented 1 year ago

One easy thing that often helps to call traceback() after the error. This prints out the call stack which shows where this error occurred. As I wrote in my previous messages, adonis2 (and anova.cca by="margin") do not call qr.X and therefore I do not know in which function this error crops out. This can sometimes help even without a reproducible example because then we know where to search.

jarioksa commented 1 year ago

Now I know where this happens and how we get there. I replaced base::qr.X with a bogus function and got the call stack after error:

> qr.X # bogus function to replace base::qr.X
function(...) stop("hit qr.X")
<environment: namespace:base>
> adonis2(dune ~ ., dune.env, by="margin") # by = "margin" hits bogus qr.X which gives an error
Error in qr.X(object$CCA$QR) : hit qr.X
> traceback()
8: stop("hit qr.X") at #1
7: qr.X(object$CCA$QR)
6: model.matrix.rda(object)
5: model.matrix(object)
4: anova.ccabymargin(object, permutations = permutations, model = model, 
       parallel = parallel, scope = scope)
3: anova.cca(sol, permutations = perm, by = by, parallel = parallel)
2: anova(sol, permutations = perm, by = by, parallel = parallel)
1: adonis2(dune ~ ., dune.env, by = "margin")

Standard disclaimer: Don't do this at home! – you do not want to replace base::qr.X with a bogus function.

Clearly the calling sequence of vegan functions is adonis2(..., by="margin")anova.ccabymarginmodel.matrix.rda. Therefore the problem should be fixed in model.matrix.rda (and probably also model.matrix.cca has the same problem). I have no idea which conditions generate this error. I tried a few cases, but all where successful in the sense that no error occurred.

The same problem should be hit in anova(..., by = "margin") of rda (and probably cca) results.

We need a test case that produces this error.

jarioksa commented 1 year ago

Here a reproducible example:

> adonis2(dune ~ Moisture * Use * A1, dune.env, by = "margin")
Error in qr.X(object$CCA$QR) : 
  need larger value of 'ncol' as pivoting occurred
> m <- rda(dune ~ Moisture * Use * A1, dune.env)
> anova(m, by="margin")  # anova.ccabymargin calls model.matrix
Error in qr.X(object$CCA$QR) : 
  need larger value of 'ncol' as pivoting occurred
> anova(m, by="axis")   # anova.ccabyaxis calls model.matrix
Error in qr.X(object$CCA$QR) : 
  need larger value of 'ncol' as pivoting occurred

Here the core of the problem:

> m
Call: rda(formula = dune ~ Moisture * Use * A1, data = dune.env)

               Inertia Proportion Rank
Total         84.12368    1.00000     
Constrained   78.15147    0.92901   17
Unconstrained  5.97222    0.07099    2
Inertia is variance 
Some constraints or conditions were aliased because they were redundant
## -- clip -- ##
> alias(m, names=TRUE)
[1] "Moisture.Q:Use.Q"    "Moisture.C:Use.Q"    "Moisture.C:A1"      
[4] "Moisture.C:Use.L:A1" "Moisture.Q:Use.Q:A1" "Moisture.C:Use.Q:A1"

Explanation: when you have higher-order interactions, some of these interaction terms become redundant and explain nothing. In this case, the rank of constrained component was 17 leaving 2 degrees of freedom for the residual component. Six interaction terms were aliased, meaning that the full design matrix had 17 + 6 = 23 columns. The data have 20 rows (observations), and with its default settings qr.X can only construct a design matrix of 20 columns, and needs more columns for pivoting (or handling the pivoted redundant variables).

There is an argument (ncol =) in qr.X which can be set to higher values, but I haven't checked how it fills the aliased columns and will that work with vegan:::anova.cca. There are also other alternatives that we may inspect: Do we need model.matrix in anova.ccabymargin & anova.ccabyaxis? Can we detect these cases and handle these smoothly?

This error will occur in special cases: the number of non-aliased terms must be < n-1 so that there is residual variation left and anova will be run, but the number of aliased terms must be so high that together with non-aliased terms the model matrix has > n columns. If there are ≥ n-1 non-aliased terms, anova.cca will catch this case and use anova.ccanull:

> adonis2(dune ~ matrix(runif(20*19), 20, 19), dune.env, by = "margin")
No residual component

adonis2(formula = dune ~ matrix(runif(20 * 19), 20, 19), data = dune.env, by = "margin")
         Df SumOfSqs R2 F Pr(>F)
Model    19    4.299  1         
Residual  0    0.000  0         
Total    19    4.299  1      

I think it makes no sense to use by = "margin" if you have high-order interactions, because the only marginal term will be the highest order interaction of all predictors in the model. You get that term as the last ("marginal") also with by = "term" with all other terms sequentially as a bonus.

jarioksa commented 1 year ago

This is fixed in commit a64ed4e67d1a0a5. Please check this and report on success.

The error occurred with two conditions:

Most commonly this happened in models with high-order interactions including factor variables which typically have several redundant combinations of factor levels.

The bug affected adonis2 with by="margin" and constrained ordination methods (rda, cca, dbrda) in anova with by="margin" or by="axis".

Although the bug is now avoided, you usually do not want to have permutations tests by = "margin" with high-order interactions, because the marginal term typically is one of those redundant levels, and no finite P-value can be reported.