trinker / wakefield

Generate random data sets
256 stars 28 forks source link

Correlated fields #30

Closed DataStrategist closed 2 years ago

DataStrategist commented 2 years ago

Hi! So frequently, when I am training people on data analysis, I will put in specific issues for the students to pick up on. For example, I might have people from one city showing many more deaths, or I might show women as performing higher than men in 3 regions. The way I do this is:

Create a normal set, where let's say education is normally (or whatever) distributed. Then create a smaller dataset with the deviation, then bind_row and randomize the order.

That's all well and good, but I was wondering if there would be benefit from some kind of function like:

r_data_frame(
  n = 10, 
  sex(x = c("M", "F"), prob = c(0.9, 0.1)),
  died(correlator = "sex==F", corr_effect = "+20%")
)

where correlator and corr_effect are the new adverbs.

DataStrategist commented 2 years ago

arjj... didn't see. Dup of #3