ttimbers / distransam

R package to create random samples with equal N's across different sample groups
Other
0 stars 0 forks source link

a possible alternative to nesting the data frame #3

Open aammd opened 8 years ago

aammd commented 8 years ago

Just a thought, but this approach might work as a possible alternative to the approach you use here


library(purrr)
library(dplyr)
library(tidyr)

## nest the data -- make it one line per combination of grouping factors:

test_data_ts_nested <- test_data %>% 
  nest(time, measurement, .key = timeseries)

## collect the columns that are *not* time series values:
group_cols <- test_data_ts_nested %>% 
  keep(is_atomic)

#create a function to split, resample and recombine a data.frame. # NOTE that it
#assumes that factors go from more general (on the left) to more specific (on
#the right)
sample_groups <- function(df, name){

  ## find a given grouping factor
  max_grp_name <- which(names(df) == name)

  ## unite it with all more general grouping factors. 
  ## NB this assumes factors are ordered from general to specific!!!
  grps <- df %>% 
    unite(code, 1:max_grp_name) %>% 
    .[["code"]]

  group_df_list <- split(df, grps)

  ## how many records are in the smallest part of this list?
  smallest_part_list <- group_df_list %>% 
    map_dbl(nrow) %>% 
    min

  ## resample, and convert back to a data.frame
  map_df(group_df_list,
         sample_n,
         size = smallest_part_list,
         replace = FALSE) ## replace = FALSE so that smallest groups are always the same
}

## resample at all but the lowest level, going from most specific to least 
## specific. This could be a for-loop if you want to make it even more general!

resampled_grps <- group_cols %>% 
  sample_groups("plate") %>% 
  sample_groups("group")

resampled_grps %>% 
  left_join(test_data_ts_nested, by = c("group", "plate", "id"))

## you probably want to do this many times, to obtain the error caused by resampling:
many_replicates <- replicate(100, {group_cols %>% 
    sample_groups("plate") %>% 
    sample_groups("group")}, simplify = FALSE) %>% 
  bind_rows(.id = "rep")

many_replicates %>% 
  left_join(test_data_ts_nested)