tidyverse / modelr

Helper functions for modelling
https://modelr.tidyverse.org
GNU General Public License v3.0
401 stars 65 forks source link

Feature request for bootstrap() #52

Closed roxannebeauclair closed 6 years ago

roxannebeauclair commented 7 years ago

I like the bootstrap function and how it can easily be fit into a tidyverse pipeline. That said, is it possible to add a feature that allows the bootstrapping to occur at the cluster level for hierarchical data? An added bonus would be to have the bootstrap work for unbalanced data.

GuiMarthe commented 7 years ago

Whenever I bootstrap an estimator, I use the bootstrap function from the broom package, which is pipe compliant. When it comes to doing boot estimates per group, this is how I do it.

empirical_boot_mean <- function(tb, column, n=1e3) {
  alpha = 0.05
  group_mean <- mean(tb[[column]])

  df <- tb %>%
    broom::bootstrap(n) %>%
    do(data_frame(boot_mean = mean(.[[column]]))) %>%
    mutate(delta = boot_mean - group_mean) %>%
    ungroup() %>%
    summarise(
      lowci = group_mean + quantile(delta, alpha),
      highci = group_mean + quantile(delta, 1 - alpha),
      mean_boot = group_mean
    )
  df
}

my_data %>%
  select(cluster, observation) %>%
  group_by(cluster) %>% 
  na.omit() %>% 
  nest() %>% 
  mutate(boot_interval = map(data, .f = ~empirical_boot(.,'observation', n = 4000))) %>% 
  unnest(boot_interval) %>% 
  select(-data)
jrnold commented 7 years ago

I implemented group-aware resampling in https://github.com/jrnold/resamplr. All the resampling functions are generic functions with methods for data.frame andgrouped_df. If a grouped data frame is passed to it, then it resample groups or within groups (or both) depending on the method and the arguments given to it.

GuiMarthe commented 7 years ago

that seems very nice! Ill go check it out!

hadley commented 6 years ago

This is out of scope for modelr