vegandevs / vegan

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis
https://vegandevs.github.io/vegan/
GNU General Public License v2.0
448 stars 97 forks source link

likely error in CLR formula in decostand docs #621

Open handibles opened 7 months ago

handibles commented 7 months ago

Dear Devs,

Thanks as always for the work. I was using decostand as a quick look up for the CLR formula, but think it's wrong in the vegan docs.

While the code for CLR (.calc_clr) is fine - mean of logs :

means <- rowMeans(clog)

The docs indicate that the formula here is $log(x) - log(u)$, where $x$ are the beasties and $u$ is the beastie mean, i.e. log of means.

Sadly, > mean(log(1:10)) == log(mean(1:10)) > [1] FALSE

Might save the next veganner from a mishap. All the best, CH

jarioksa commented 5 months ago

@antagomir Have you looked at this?

antagomir commented 5 months ago

Woops, yes. This we should be able to solve asap. I will have a look today/tomorrow and open a PR. The broader rCLR improvements might take more time.

antagomir commented 5 months ago

I think the issue confused arithmetic and geometric mean, and the documentation is correct.

The documentation states that

clr = log(x/g(x)) = log x - log (g(x)), where g(x) denotes the geometric mean.

This is how CLR transformation is formally defined.

Log of geometric mean can be written as: log (g(x)) = log ((x1 * ... * xn)^(1/n)) = (1/n) * (log(x1) + ... + log(xn)) = mean(log(x))

Thus, log(g(x)) = mean(log(x)), where g(x) is geometric mean.

This is also seen with:

gm_mean = function(a){prod(a)^(1/length(a))}; log(gm_mean(1:10)) [1] 1.510441

mean(log(1:10)) [1] 1.510441

I am not sure whether the documentation could/should be improved since it already states that g(x) is the geometric mean, and this is how CLR definition is written in most sources afaik.

As far as I can see the documentation is correct and this issue could be closed unless there are further suggestions on how to improve.

handibles commented 5 months ago

Thanks @antagomir. I've yet to find a solid mnemonic for the order in the CLR transform (hence the OP), but if I've got this correctly:

Definition of the CLR (as above): clr = log(x/g(x)) = log x - log(g(x)) i.e., clr = log(x/g(x)) = log x - mean(log(x))

Definition in the decostand documentation (latex'd formula): $clr = log(x) - log(u)$
, where u is the arithmetic mean :

Is that not incorrect?...

antagomir commented 5 months ago

Thanks @handibles - yes - that seems incorrect.

Looking at the current master branch of the man pages in github:vegandevs/vegan, lines 94-109 (clr documentation) refers to the geometric mean. So at least that manpage seems to be correct.

Where did you find that incorrect formula exactly - could you point me to the exact source? I will correct asap if we have a mistake anywhere. But right now I am not able to trace this..!