statisfactions / simpr

Tidyverse-friendly simulations and power analysis
42 stars 6 forks source link

Generate multiple columns from a single command #7

Closed statisfactions closed 4 years ago

statisfactions commented 4 years ago

Currently, only single column output can be specified in variables, e.g.

variables(x1 = ~ 1 + rnorm(30),
                 x2 = ~ x1 + 1,
                 y = ~ x1 + x2) %>%
   gen(1)

So each element of variables is just a single variable. It would make it easier to work with other packages and custom functions if we could generate multiple variables at once, for instance so that we could use functions like mvrnorm --

library(MASS)
mvrnorm(30, rep(0, 3), Sigma = diag(3))

which generates a 30 x 3 matrix -- i.e. three variables with one command.

I see three syntax options for this:

1) Double-sided formulas on unnamed arguments:

variables(cbind(x, y, z) ~ mvrnorm(30, rep(0, 3), Sigma = diag(3))

2) Named arguments with numbers automatically appended:

## Generates x_1, x_2, x_3
variables(x = ~ mvrnorm(30, rep(0, 3), Sigma = diag(3), sep = "_") 

3) Separate names argument, overwrites current names

variables(x = ~ MASS::mvrnorm(30, rep(0, 3), Sigma = diag(3), 
    y = runif(30), 
    names = c("SES", "age", "IQ", "SAT_score"))

I don't like 3) because cross-references between variables are confusing. E.g. if I want y to equal the sum of the xs, how do I indicate that?

I plan to implement 1) and 2).