Closed fvlampe closed 5 months ago
Please do not call capscale
to db-RDA: we do have a function for db-RDA, and it is called dbrda
. capscale
is something related, but not the real thing.
I really do not know what is right and wrong in handling negative eigenvalues in capscale
. You can see this scanning commit history of capscale
where alternative ways have been alternating. It is clear that now we have two inconsistent ways of reporting negative eigenvalues. In old times we did not report negative eigenvalues at all, and their handling was fixed in release 1.17-6 (2011-01-10). In vegan 2.4-0 (2016-06-15) we introduced real dbrda
and in this connection we also added reporting on negative eigenvalues in capscale
. However, we did not notice that we should change the summary
as well – mainly because nobody of us was using summary
any longer – and @gavinsimpson had promised to rewrite summary
anyway (issue #203). We also discussed deprecating and ultimately removing capscale
function (issue #217) in favour of dbrda
, but I made a mistake and let it linger with minimal maintenance.
I'll make the summary
consistent with the modern way.
First a WARNING: I made a wrong assumption about weights in wcmdscale
and this precipitates in underestimating the magnitude of imaginary components in capscale
. For correct results and to replicate the results in this message, you need to use either the CRAN release (version 2.6-4) or a current development version. Github version of vegan from November 2022 underestimated the imaginary components.
Then to capscale
.
I have now gone through handling the imaginary components in various vegan functions. We seem to have three different ways of handling those, but I am not sure which of these are misleading (or at least two may be misleading, but perhaps nobody wants to have the third one).
We have two related functions: capscale
and the real db-RDA dbrda
. capscale
was released in vegan 1.6-0 (Oct 2003), and dbrda
in 2.4-0 (Jun 2016). capscale
works by analysing Euclidean mapping input dissimilarities. With non-Euclidean dissimilarities this cannot be done exactly but there are some negative eigenvalues with associated imaginary dimensions that correct the errors of Euclidean mapping. These are ignored in capscale
. dbrda
works with original dissimilarities also taking into account the imaginary dimensions (per component). I considered dropping capscale
from vegan after introducing dbrda
and we briefly discussed this (issue #217), and while the majority voted for removing capscale
I still kept the function (which now again begins to look like a mistake).
Here a comparison of printed output of unconstrained analysis of Dune meadow data with capscale
and dbrda
:
Call: capscale(formula = d ~ 1)
Inertia Rank
Total 4.2990
Unconstrained 4.5939 14
Imaginary -0.2949 5
Inertia is squared Bray distance
Call: dbrda(formula = d ~ 1)
Inertia Rank RealDims
Total 4.299
Unconstrained 4.299 19 14
Inertia is squared Bray distance
The total inertia – or the sum of all eigenvalues – is estimated directly from the input dissimilarities prior to ordination, and it is the same in both analyses. This inertia is the same as the total variation in the data. In dbrda
this total inertia is directly decomposed into constrained and residual variation and both of these may have negative eigenvalues (though usually only unconstrained component has those apart from overfitted models). In capscale
the unconstrained component is larger than the total inertia, meaning that they display higher variance than the data. That extra variance would be handled by the imaginary component, and Total = Unconstrained + Imaginary (with minus sign for Imaginary component). capscale
ignores the imaginary component and only analyses the real component. For capscale
we have two different ways of expression the variation of these components, like you reported:
### print
Call: capscale(formula = d ~ Management + Moisture, data = dune.env)
Inertia Proportion Rank
Total 4.2990 1.0000
Constrained 2.6771 0.6227 6
Unconstrained 1.9169 0.4459 13
Imaginary -0.2949 -0.0686 5
Inertia is squared Bray distance
### summary
Partitioning of squared Bray distance:
Inertia Proportion
Total 4.594 1.0000
Constrained 2.677 0.5827
Unconstrained 1.917 0.4173
The first default print
compares proportions to the total inertia of the input matrix (4.299), but instead we analysed its Euclidean mapping which had higher inertia and ignored the imaginary component. This means that for inertiae, constrained + unconstrained > total, and for proportion constrained + unconstrained > 1, or in this case 1.07 (107%). The summary
(which uses eigenvals
function) compares the components to the analyses Euclidean mapping and for them 2.677 + 1.917 = 4.594, and for proportions 0.5827 + 0.4173 = 1. Surely one of these ways is misleading, but I'm not sure which one.
It is clear that we have been inconsistent, and this inconsistency must be cleared. We have used three different strategies:
summary
as well as in eigenvals
and its summary
that gives the cumulative proportions. It is also used in permutation tests in anova
& permutest
.capscale
for similar reasons. For instance bstick
(broken sticks) refuse results with negative eigenvalues, either capscale
or dbrda
. An amusing detail is that summary
of eigenvals
for dbrda
refuses to give cumulative proportions explained, although it gives these proportions for each axis (and with positive sign for negative eigenvalues – this clearly needs to be fixed.Something must be done, but I'm not sure what. Option one is to follow the majority of methods that compare the proportions to real components and ignores imaginary. The only change needed is to make the brief result look like:
Inertia Proportion Rank
Total 4.2990
Real 4.5939 1.0000
Constrained 2.6771 0.5827 6
Unconstrained 1.9169 0.4173 13
Imaginary -0.2949 5
Option 2 is to change other functions to compare proportions to the original total inertia, which means that people must learn to live with >100% proportions. Obviously this concerns eigenvals
(where dbrda
needs fixing anyway), but also some other functions.
In both cases those functions that refuse to handle negative eigenvalues could be made consistent with this decision.
Finally, I'm still more attracted to deprecating capscale
and pointing people to use dbrda
instead.
Opinions are appreciated.
Thanks a lot for taking care of this topic so quickly and for the detailed analysis and explanation!
I am not expert enough to have a clear opinion, but it seems to me that Option 2 is very unintuitive and explained proportions seem overestimated and not meaningful.
Therefore consistent integration of Option 1 would allow to always recognize the presence of the imaginary component and get consistent proportions. Maybe even a note could be printed, indicating that the imaginary component is ignored in calculating the proportions. In that case, I guess the user should then go for Option 3 and apply one of the available corrections to avoid negative components at all.
However, dbrda
seems to follow Option 2, but is simply substracting the imaginary component from the unconstrained component? I wonder if explained proportions for the constrained variation are meaningful at all here (suggesting Option 3)...
I have remade my analysis with dbrda
and don't see any reason to keep capscale
right now.
I think I have now changed most things to be consistent following option 1: ignore imaginary components and only base proportions on real components:
Call: capscale(formula = dune ~ Condition(Moisture) + Management, data
= dune.env, distance = "bray")
Inertia Proportion Rank
Total 4.2990
RealTotal 4.5939 1.0000
Conditional 1.7440 0.3796 3
Constrained 0.9331 0.2031 3
Unconstrained 1.9169 0.4173 13
Imaginary -0.2949
Inertia is squared Bray distance
The corresponding part in summary
reads:
Partitioning of squared Bray distance:
Inertia Proportion
Total 4.5939 1.0000
Conditioned 1.7440 0.3796
Constrained 0.9331 0.2031
Unconstrained 1.9169 0.4173
RsquareAdj
was also changed to return:
$r.squared
[1] 0.2031098
$adj.r.squared
[1] 0.133984
Many other functions changed on the way. See commit history if you want to see all changes. Some of these changes do not directly relate to these capscale
changes, but things I found while inspecting this issue.
Some bigger associated changes are:
summary
was made less verbose per wishes in issue #203.dbrda
: summary
of eigenvals
shows now proportions of eigenvalues against sum of all eigenvalues, or the original inertia of the input dissimilarities. This means that proportions add up to >1 for positive eigenvalues and goes down to 1 with negative eigenvalues. Here I'm open to opinion. I haven't yet decided how to handle broken sticks of eigenvalues: now bstick
refuses to touch dbrda
because she doesn't know what is the total inertia.
Dear devs,
I recently calculated db-RDA with capscale and realised one thing in the model outputs:
The reported results differ when calling the model object directly with
model
and when callingsummary(model)
. When calling the model object negative Eigenvalues are reported as seperate item of sum of negative eigenvalues (Imaginary; as stated in the documentary). When callingsummary(model)
the item of sum of negative eigenvalues is entirely missing and all proportions are calculated only based on the axes with non-negative Eigenvalues. This may be very misleading, especially when one is only usingsummary()
to call the full results?Thanks for all and all the best, Friedemann
Here an example output: