vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
451 stars 98 forks source link

Capscale and anova.cca problem #328

Closed burbrink closed 5 years ago

burbrink commented 5 years ago

I have what should be an easy question to answer I hope. I am using Capscale and anova.cca in vegan to examine relationship between genetic distance among individuals predicted by several ecological niche model distance matrices generated from Circuitscape and partialing out distance in space (from lat/lon). I have done the following: Converted all of the distances on the RHS using PCNM and used “scores” to pull the scores from the PCNM results. My genetic distance is formatted as a “dist” object on the LHS. The model looks as follows:

capscale(gendist~scores(cur)+scores(ms)+scores(el)+scores(lgm)+scores(lig)+scores(mis19)+Condition(scores(dist2)),sqrt.dist=T)->try

This works and so does:

anova(try, by=”terms)

However, when I used anova(try, by=”margin) I get this error:

Error in X[, ass != i, drop = FALSE] :
  (subscript) logical subscript too long

If I chose a smaller number of axes, then it will work but seems unstable (P values change from significant to non-significant and vice versa) given the number of axes I chose. This model works with anova(try, by=”margin”) for instance:

capscale(gendist~scores(cur,choices=1:20)+scores(ms,choices=1:20)+scores(el,choices=1:20)+scores(lgm,choices=1:20)+scores(lig,choices=1:20)+scores(mis19,choices=1:20)+Condition(scores(dist2,choices=1:20))->try

If I increase the number axes to 50 then I get the same error as above.

What could be causing this error and is there way to get a stable answer using anova.cca with margins?

I thank you very much in advance!

Frank

P.S.

Here are some outputs from the reduced to the full axes model:

Call: capscale(formula = gendist ~ scores(cur) + scores(ms) + scores(el) + scores(lgm) + scores(lig) + scores(mis19) +
Condition(scores(dist2)), sqrt.dist = T)

                 Inertia Proportion Rank
Total         30.3890110  1.0000000    
Conditional   20.2125899  0.6651283  115
Constrained    9.9498632  0.3274165  118
Unconstrained  0.2551651  0.0083966    4
Imaginary     -0.0286072 -0.0009414    5
Inertia is Nei distance

Call: capscale(formula = gendist ~ scores(cur, choices = 1:20) + scores(ms, choices = 1:20) + scores(el, choices = 1:20) +
scores(lgm, choices = 1:20) + scores(lig, choices = 1:20) + scores(mis19, choices = 1:20) + Condition(scores(dist2, choices
= 1:20)))

              Inertia Proportion Rank
Total         10.1488     1.0000    
Conditional    6.9685     0.6866   20
Constrained    3.1599     0.3114  120
Unconstrained  1.9522     0.1924   97
Imaginary     -1.9319    -0.1904   88
Inertia is squared Nei distance
Some constraints were aliased because they were collinear (redundant)
eduardszoecs commented 5 years ago

Could you try to pack the scores into a data.frame, use the data=argument & adapt the right hand side of the formula?

Am 27. Sep. 2019, 13:45, um 13:45, Frank Burbrink notifications@github.com schrieb:

I have what should be an easy question to answer I hope. I am using Capscale and anova.cca in vegan to examine relationship between genetic distance among individuals predicted by several ecological niche model distance matrices generated from Circuitscape and partialing out distance in space (from lat/lon). I have done the following: Converted all of the distances on the RHS using PCNM and used “scores” to pull the scores from the PCNM results. My genetic distance is formatted as a “dist” object on the LHS. The model looks as follows:

capscale(gendist~scores(cur)+scores(ms)+scores(el)+scores(lgm)+scores(lig)+scores(mis19)+Condition(scores(dist2)),sqrt.dist=T)->try

This works and so does: anova(try, by=”terms)

However, when I used anova(try, by=”margin) I get this error: Error in X[, ass != i, drop = FALSE] : (subscript) logical subscript too long

If I chose a smaller number of axes, then it will work but seems unstable (P values change from significant to non-significant and vice versa) given the number of axes I chose. This model works with anova(try, by=”margin”) for instance: capscale(gendist~scores(cur,choices=1:20)+scores(ms,choices=1:20)+scores(el,choices=1:20)+scores(lgm,choices=1:20)+scores(lig,choices=1:20)+scores(mis19,choices=1:20)+Condition(scores(dist2,choices=1:20))->try

If I increase the number axes to 50 then I get the same error as above.

What could be causing this error and is there way to get a stable answer using anova.cca with margins?

I thank you very much in advance!

Frank

P.S.

Here are some outputs from the reduced to the full axes model:

Call: capscale(formula = gendist ~ scores(cur) + scores(ms) + scores(el) + scores(lgm) + scores(lig) + scores(mis19) + Condition(scores(dist2)), sqrt.dist = T)

            Inertia Proportion Rank

Total 30.3890110 1.0000000
Conditional 20.2125899 0.6651283 115 Constrained 9.9498632 0.3274165 118 Unconstrained 0.2551651 0.0083966 4 Imaginary -0.0286072 -0.0009414 5 Inertia is Nei distance

Call: capscale(formula = gendist ~ scores(cur, choices = 1:20) + scores(ms, choices = 1:20) + scores(el, choices = 1:20) + scores(lgm, choices = 1:20) + scores(lig, choices = 1:20) + scores(mis19, choices = 1:20) + Condition(scores(dist2, choices = 1:20)))

         Inertia Proportion Rank

Total 10.1488 1.0000
Conditional 6.9685 0.6866 20 Constrained 3.1599 0.3114 120 Unconstrained 1.9522 0.1924 97 Imaginary -1.9319 -0.1904 88 Inertia is squared Nei distance Some constraints were aliased because they were collinear (redundant)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/vegandevs/vegan/issues/328

burbrink commented 5 years ago

I am not entirely sure what you are asking me to do with the "data=argument" request. I have packed all of the scores from the RHS of the argument (excluding the Condition() variable which are just the spatial points) into a data.frame. I have attached them here as just a table (use read.table), and scores for each variable are designated in the header as .1, .2 etc... I have also attached the genetic distances from the LHS and the conditioned spatial points (also saved as tables). Thank you!

data_for_vegan_table.txt

gendist.txt . points.txt

jarioksa commented 5 years ago

I can confirm the occurrence of this error message with the data you supplied (although these data do not conform with the example you had in your original post). The behaviour may be difficult to fix since the error comes from deep R: some strings just are too long for R to handle. I can have a look at this, but not urgently. The real problem is that your should not be able to do this kind of analysis at all, nor any other kind of anova, but we fail to detect this and stop the analysis earlier.

The real problem with your analysis is that you have 238 observations in your distance matrix, and the PCNM data you supplied had 1263 variables. You should be able to explain completely your data with 237 random variables, and you have a very severe case of overfitting. Actually, the residual component should be zero, and no significance test should be possible. You got away with this thanks to a methodological artifact: Your dissimilarities are non-metric, and no linear regression can explain non-metric data completely with real-valued predictors. In your case, it seems that the rank of your data is 233 with 229 real dimensions and 4 imaginary ones (with sqrt.dist=TRUE). The residual component is based on real predictors only, and the residuals are actually the non-metric component of your dissimilarity. Instead of zero residual you have the degree of non-linearity of your dissimilarity as residuals. This is of course wrong. I should look how to handle this and stop early all attempts of significance tests in cases like this (not in capscale which is an approximate method anyway, but in dbrda). With distance data of 238 observations, you just cannot use as many variables as you have used, but you must honour the limits of your data. Much less than 200 predictors is the maximum for any sensible model (you probably can technically handle 230-something variables, but such models would make no sense).