statisfactions / simpr

Tidyverse-friendly simulations and power analysis
41 stars 6 forks source link

Accomodate number of simulations within `specify()` for functions that already return multiple simulations #45

Open statisfactions opened 2 years ago

statisfactions commented 2 years ago

If something is already written as a simulation package, the only way to interface simpr with that package is to have that package simulate one value at a time, which is hugely inefficient. It would be nice to have a way of accomodating the number of simulations within specify specification.

statisfactions commented 2 years ago

Here's a start on an idea of how we might be able to make this work. This code takes the output of generate from a data-generating function that already inherently simulates many replications, and splits the output appropriately:

suppressPackageStartupMessages(library(simpr))
library(tidyverse)
out = specify(
  g1 = ~ rbinom(100,
                size = 50,
                prob = 0.5
  ))%>%
  generate(1)
out
#> full tibble
#> --------------------------
#> # A tibble: 1 × 3
#>   .sim_id   rep sim               
#>     <int> <int> <list>            
#> 1       1     1 <tibble [100 × 1]>
#> 
#> sim[[1]]
#> --------------------------
#> # A tibble: 100 × 1
#>       g1
#>    <int>
#>  1    20
#>  2    25
#>  3    22
#>  4    33
#>  5    31
#>  6    25
#>  7    23
#>  8    24
#>  9    24
#> 10    28
#> # … with 90 more rows

## Wrangle to more typical shape
out %>%
  rowwise() %>%
  mutate(
         rep_within = list(rep = 1:nrow(sim)),
    sim_within = list(sim = split(sim, 1:nrow(sim)))
    ) %>%
  select(-sim) %>%
  ungroup %>%
  unnest(c(rep_within, sim_within))
#> # A tibble: 100 × 4
#>    .sim_id   rep rep_within sim_within      
#>      <int> <int>      <int> <named list>    
#>  1       1     1          1 <tibble [1 × 1]>
#>  2       1     1          2 <tibble [1 × 1]>
#>  3       1     1          3 <tibble [1 × 1]>
#>  4       1     1          4 <tibble [1 × 1]>
#>  5       1     1          5 <tibble [1 × 1]>
#>  6       1     1          6 <tibble [1 × 1]>
#>  7       1     1          7 <tibble [1 × 1]>
#>  8       1     1          8 <tibble [1 × 1]>
#>  9       1     1          9 <tibble [1 × 1]>
#> 10       1     1         10 <tibble [1 × 1]>
#> # … with 90 more rows

Created on 2022-02-03 by the reprex package (v2.0.1)

statisfactions commented 2 years ago

I'm thinking the specification for how to split the output should happen in specify(split_by = ...) or similar. The wrangling itself would happen within generate(), likely at generate_row() where the tibble wrangling takes place.

statisfactions commented 2 years ago

If user specifies .reps within define(), generate() could even try to make a guess about how to split the output by what dimension of the output matches .reps... split_by = split_guess, where split_guess is a function.