trinker / wakefield

Generate random data sets
256 stars 28 forks source link

group() limited to 2 groups? #5

Closed ds4ci closed 9 years ago

ds4ci commented 9 years ago

Thanks for wakefield!

I need to generate factors with more than two levels. group() accepts x with length > 2, but only samples first two. Here is a toy example:

tg <- group(n=100, x=c("a", "b", "c"), name = "test") summary(tg) a b c 43 57 0 tg <- group(n=100, x=c("a", "b", "c"), prob = c(0.1, 0.2, 0.7), name = "test") Error in sample.int(n = 2, size = n, replace = TRUE, prob = prob) : incorrect number of probabilities tg <- group(n=100, x=c("a", "b", "c"), prob = c(0.1, 0.2), name = "test") summary(tg) a b c 32 68 0

HTH, Jim

trinker commented 9 years ago

If you look under the hood of that function you see:

group <- hijack(r_sample_binary_factor,
    name = "Group",
    x = c("Control", "Treatment")
)

This is just using the function r_sample_binary_factor with some defaults for cotrol/treatement. That means it's not possible to use group to make more than 2 groups. I can certainly see where you might expect groups to allow n groups but that was not my intent. I will keep the functionality as is because 2 group sampling is more common in my experience when we create groups and making it extended to n groups means a slower function. Most of the little variable generating functions are actually just hijacking a function prefixed r_. So if you can't find a variable function you're looking for go to the r_ prefixed functions, in this case r_sample_factor:

tg <- r_sample_factor(n=100, x=c("a", "b", "c"), prob = c(0.1, 0.2, 0.7), name = "test")

## > tg <- r_sample_factor(n=100, x=c("a", "b", "c"), prob = c(0.1, 0.2, 0.7), name = "test")
## > summary(tg)
##  a  b  c 
##  8 18 74 

I'm going to close this for now. Feel free to reopen if this does not address your concerns. I'll update documentation to be clearer..