Closed roxannebeauclair closed 6 years ago
Whenever I bootstrap an estimator, I use the bootstrap
function from the broom
package, which is pipe compliant. When it comes to doing boot estimates per group, this is how I do it.
empirical_boot_mean <- function(tb, column, n=1e3) {
alpha = 0.05
group_mean <- mean(tb[[column]])
df <- tb %>%
broom::bootstrap(n) %>%
do(data_frame(boot_mean = mean(.[[column]]))) %>%
mutate(delta = boot_mean - group_mean) %>%
ungroup() %>%
summarise(
lowci = group_mean + quantile(delta, alpha),
highci = group_mean + quantile(delta, 1 - alpha),
mean_boot = group_mean
)
df
}
my_data %>%
select(cluster, observation) %>%
group_by(cluster) %>%
na.omit() %>%
nest() %>%
mutate(boot_interval = map(data, .f = ~empirical_boot(.,'observation', n = 4000))) %>%
unnest(boot_interval) %>%
select(-data)
I implemented group-aware resampling in https://github.com/jrnold/resamplr. All the resampling functions are generic functions with methods for data.frame
andgrouped_df
. If a grouped data frame is passed to it, then it resample groups or within groups (or both) depending on the method and the arguments given to it.
that seems very nice! Ill go check it out!
This is out of scope for modelr
I like the bootstrap function and how it can easily be fit into a tidyverse pipeline. That said, is it possible to add a feature that allows the bootstrapping to occur at the cluster level for hierarchical data? An added bonus would be to have the bootstrap work for unbalanced data.