Closed RickoClausen closed 5 years ago
Yes, I reproduce this, it's because @langcog has hard-coded "mean" as a column name in tidyboot_mean()
. The same issue can be replicated from the documentation for tidyboot.data.frame
if you try to run tidyboot the long way:
gauss1 <- data_frame(value = rnorm(30, mean = 0, sd = 1), site = 1, spp = 1)
gauss2 <- data_frame(value = rnorm(20, mean = 2, sd = 1), site = 1, spp = 2)
gauss3 <- data_frame(value = rnorm(50, mean = 1, sd = 1), site = 2, spp = 1)
gauss4 <- data_frame(value = rnorm(7, mean = 3, sd = 1), site = 2, spp = 2)
df <- bind_rows(gauss1, gauss2, gauss3, gauss4)
# As provided in the documentation for tidyboot.data.frame, but with one added group.
df %>% group_by(site, spp) %>%
tidyboot(summary_function = function(x) x %>% summarise(mean = mean(value)),
statistics_functions = function(x) x %>%
summarise_at(vars(mean), funs(ci_upper, mean, ci_lower)))
## site spp n empirical_mean ci_upper mean ci_lower
## <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 30 -0.288 0.0931 -0.282 -0.282
## 1 2 20 2.33 2.79 2.32 2.32
## 2 1 50 0.886 1.13 0.888 0.888
## 2 2 7 3.06 4.27 3.08 3.08
And it gets worse if you compute mean
first, because it will then be used as the new mean
column for the functions that run after it:
df %>% group_by(site, spp) %>%
tidyboot(summary_function = function(x) x %>% summarise(mean = mean(value)),
statistics_functions = function(x) x %>%
summarise_at(vars(mean), funs(mean, ci_upper, ci_lower)))
## site spp n empirical_mean mean ci_upper ci_lower
## <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 30 -0.288 -0.296 -0.296 -0.296
## 1 2 20 2.33 2.34 2.34 2.34
## 2 1 50 0.886 0.879 0.879 0.879
## 2 2 7 3.06 3.10 3.10 3.10
If you rename the raw data column then it works out fine.
df %>% group_by(site, spp) %>%
tidyboot(summary_function = function(x) x %>% summarise(my_mean = mean(value)),
statistics_functions = function(x) x %>%
summarise_at(vars(my_mean), funs(ci_upper, mean, ci_lower)))
## site spp n empirical_my_mean mean ci_upper ci_lower
## <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 30 -0.288 -0.288 0.0589 -0.645
## 1 2 20 2.33 2.34 2.83 1.85
## 2 1 50 0.886 0.882 1.12 0.626
## 2 2 7 3.06 3.09 4.27 1.82
It seems strange that the upper_ci and mean is the same for your readme example:
I get the same "bug" when I run it on my dataset. Is there something wrong with the
statistics_functions
? When I changed the order from ci_lower, mean, ci_upper to ci_upper, mean, ci_lower then the ci_lower is the same as mean.Thanks.