Closed FabianRoger closed 3 years ago
I wonder if part of it is that when we standardize, we scale to the max, but if much of the function is clustered away from 0, we never get a true 0 to 1 spread, i.e., min observed to max observed. While this seems desirable, it might also bias the evenness upwards.
Two thoughts - first, we could standardize everything on a 0,1 scale, making the tacit assumption that we have a minimum and maximum for the system. That’s straightforward to implement. The second thought… actually, no, it would reduce to the same thing. Particularly if true 0’s are real.
But, hrm, we already functionally do 0-1 when we standardize an inverse or negative function. So maybe 0-1 for all makes sense?
yeah no, standardizing between 0 and 1 changes the distribution quite a bit, I don't think that's a good idea. It was the fallout of this standardization that produced the 'jack-of-all-trades' paper (we think) because doing so would always produce the S-shape.
But I checked and both standardizations do change the evenness factor (although not consistently in any direction) so yeah, we might need to think about this
The issue is, there’s going to need to be some sort of scale that is all positive and common across all functions in order to invoke any of the methodologies here.
Hrm.
One could be a Hedge’s G-based metric of (F - Fmin)/sd(F) Or a ln(F/Fmin)
This has the added bonus of no longer invoking the maximum. Although we trade this for the minimum. On the other hand, if there is a theoretical minimum (often 0) that might help with something like Hedge’s standardization. Added bonus, for things that have a negative minimum, it would obviate the need for futzing with adding corrections.
Thoughts?
I like the idea of having a solution for negative values of functions. Have to admit I'm struggling to get my head around the problem. If I understand correctly, all the treatments (i.e. lambda) had the same max value (1), right) - so how would the weak relationship between lambda and evenness be due to standardization. Maybe standardization could be a problem, but I don't see that it is the problem here.
Do we need to worry about the shallow relationship? Given the links to information theory and the links to this concept we are wrestling with "the number of equally provided functions", perhaps there is a non-intuitive but mathematically sound reason why skew affects that metric less than we'd expect?
Yes, you're absolutely right, I don't think that standardization is the main culprit here. I threw it in because I was wondering if it could affect the evenness (which it can - therefore we need to give it some thought) but no, it does not make it systematically more even.
And yes, the simulations shown above are independent of standardization but simply to illustrate how skewed the distributions of functions need to be before the evenness factor considerably alters mf away from the average metric - especially as there will need to be large differences in evenness to affect the correlation with the average metric (i.e. if the evenness factor is 0.7 ± 0.1 across the board, it's basically a constant)
To what extent do we think this is because with this framework, MFa puts a lower bound on MFn (and MFe), as I show in the revised text. So, MF based on number of effective functions or evenness is always going to be close to MFa, but it will not precisely match it?
So, in my inimitable wisdom, I actually built a flexible standardization format into the multifunc library. There is one standardization function that handles all of the data manipulation, but calls different standardizing functions.
Note, this goes with the usual caveat that I want to rewrite the whole thing using tidy principles one day, but that is neither here nor there.
Now, the default is to put everything on the unit scale. And as we've outlined, we take functions with negative values, and bump them until their lowest value is 0. This may not be good, and @FabianRoger has pointed out some big flaws. So, let's explore a few different standardization formats a bit more
library(multifunc)
library(dplyr)
library(ggplot2)
#this is what is in the package now
getStdAndMeanFunctions <- function (data, vars,
standardizeFunction = standardizeUnitScale,
...)
{
ret <- plyr::colwise(standardizeFunction, ...)(data[, which(names(data) %in%
vars)])
names(ret) <- paste(names(ret), ".std", sep = "")
ret$meanFunction <- rowSums(ret)/ncol(ret)
return(ret)
}
standardizeUnitScale <- function (afun, min0 = T, maxValue = max(afun, na.rm = T))
{
if (min0 && min(afun, na.rm = T) < 0)
afun <- afun + abs(min(afun, na.rm = T))
afun/maxValue
}
standardizeZScore <- function (afun)
(afun - mean(afun, na.rm = F))/sd(afun, na.rm = T)
#These two are new
standardizeHedges <- function (afun, minFun = min(afun, na.rm=T))
(afun - minFun)/sd(afun, na.rm = T)
standardizeLR <- function (afun, minFun = min(afun, na.rm=T), offset=1)
log(afun + offset) - log(minFun + offset)
The advantage to these later functions is that we can specify a minimum - and it can be 0, if we want! Or, if we use the minimum observed as 0, then we can do so.
Let's play with this a bit using the duffy data. Note, I'm going to flip and move the lower bound to 0 for total algal mass and chl a.
data("duffy_2003")
duffyAllVars <- qw(grazer_mass,wkall_chla,tot_algae_mass,
Zost_final_mass,sessile_invert_mass,sediment_C)
duffyAllVars.std <- paste0(duffyAllVars, ".std")
duffy <- duffy_2003 %>%
dplyr::select(treatment, diversity, one_of(duffyAllVars)) %>%
dplyr::mutate(wkall_chla = -1*wkall_chla + max(wkall_chla, na.rm=T),
tot_algae_mass = -1*tot_algae_mass + max(tot_algae_mass, na.rm=T))
Now, the standardizations and calculation of functional evenness and multifunctionality. Note, as some standardizations yield max functions > 1, I'm going to rescale to 0 to 1 by dividing by the max.
duffy_std <- rbind(
cbind(duffy, type = "standardizeUnitScale", getStdAndMeanFunctions(duffy, duffyAllVars)),
cbind(duffy, type = "standardizeHedges", getStdAndMeanFunctions(duffy, duffyAllVars, standardizeHedges, minFun=0)),
cbind(duffy, type = "standardizeLR", getStdAndMeanFunctions(duffy, duffyAllVars, standardizeLR))
)
duffy_std <- duffy_std %>%
mutate(`Functional Evenness` = funcEven(duffy_std, duffyAllVars.std, q=1)) %>%
group_by(type) %>%
mutate(Multifunctionality = `Functional Evenness` * meanFunction/max(meanFunction),
meanFunction.std = meanFunction/max(meanFunction)) %>%
ungroup()
OK, that done, let's first see if the averages differ much
ggplot(duffy_std,
aes(x=diversity, y = meanFunction.std , color = type)) +
geom_point() +
facet_wrap(~type) +
ylim(c(0,1))
Eh, somewhat, but not too much from methodology to methodology. How different are the results?
ggplot(duffy_std,
aes(x=diversity, y = `Functional Evenness` , color = type)) +
geom_point() +
facet_wrap(~type) +
ylim(c(0,1)) +
stat_smooth(method="lm")
ggplot(duffy_std,
aes(x=diversity, y = Multifunctionality , color = type)) +
geom_point() +
facet_wrap(~type) +
ylim(c(0,1)) +
stat_smooth(method="lm")
There are actually some real differences here, which is nice. Note that MFe, however, is never < 0.5. Now, it could be that functional provisioning is just fairly even across all methods here. Results are broadly similar.
So, how correlated are results with the average. Let's look at both evenness and MF itself (noting that this won't change if we use MFn as it's just a scaling factor)
#correlation with MFa
ggplot(duffy_std,
aes(x=meanFunction.std, y = Multifunctionality , color = type)) +
geom_point() +
facet_wrap(~type) +
ylim(c(0,1))
duffy_std %>%
group_by(type) %>%
summarize(mfe_cor = cor(`Functional Evenness`, meanFunction),
mf_cor = cor(Multifunctionality, meanFunction)) %>%
knitr::kable(digits=3)
type | mfe_cor | mf_cor |
---|---|---|
standardizeUnitScale | 0.699 | 0.977 |
standardizeHedges | 0.602 | 0.968 |
standardizeLR | 0.633 | 0.943 |
Pretty correlated, regardless! I really think this is a product of evenness being constrained by the average - higher average means higher evenness/number of effective functions, regardless. That's why a metric that is the product of two is the only way to go, really.
And FYI, here's the relationship between average and evenness
Looking at the above, average function is always high, so evenness will always be high - hrm - @FabianRoger maybe try these different standardization functions with some different simulations to see if you find bigger differences?
Following our Zoom discussion, this is just how it is in nature! This is not a bug, but a feature and a truth. We are all comfortable with this.
I noticed that for most function values, even when drawn from a log-normal distribution, the evenness factor stays close to 1, effectively reducing the MF metric to the average metric. I did some simulations to look at how the evenness factor behaves for different function distributions.
seq(0,10,0.2)
are effectively the 'functions' (51)So for the evenness factor to drop below 0.5 the function distribution must be extremely steep...
I was wondering if the standardization by the maximum could be a problem here. If for the effective number of species, we would standardize all species abundance by their maximal observed abundance it would also make the communities more 'even'. However, unlike species, different functions are measured at different scales and not comparable without standardization...