kgoldfeld / simstudy

simstudy: Illuminating research methods through data generation
https://kgoldfeld.github.io/simstudy/
GNU General Public License v3.0
80 stars 7 forks source link

Treatment values change when `ratio` argument is used? #213

Closed maxdrohde closed 1 year ago

maxdrohde commented 1 year ago

I am confused by the following result. Why does adding the ratio argument change the treatment label from 0,1 to 1,2?

Thank you for creating this package, just curious to know if I'm missing something here!

library(simstudy)
library(data.table)

Case 1

Input

dd <- genData(10)
dd <- trtAssign(dd,
                nTrt = 2,
                ratio = c(1,3),
                balanced = FALSE,
                grpName = "tx")

print(dd$tx)

Output

 [1] 1 2 2 1 1 2 1 2 2 1

Case 2

Input

dd <- genData(10)
dd <- trtAssign(dd,
               nTrt = 2,
               #ratio = c(1,3),
               balanced = FALSE,
               grpName = "tx")

print(dd$tx)

Output

[1] 0 1 0 1 0 0 0 1 0 0
assignUser commented 1 year ago

I had a look at the code and the difference is this line: https://github.com/kgoldfeld/simstudy/blob/d40d33d4c869ff568608d828a11ea60c0b20a6d9/R/group_data.R#L391 if we change this to be c(.5, .5) it produces the same out put as trtObserve uses length(formulas) to set ncat which is then used to generate the values.

That line takes advantage of the fact that trtObserve adds a 'remainder' column to the matrix that is used to generate the values but produces this inconsistent result. Unless @kgoldfeld has objections I would say it makes sense to apply the minor change and make the results (and the code) consistent.

assignUser commented 1 year ago

Also @maxdrohde thanks for the well structured issue with reprex and everything 10/10! :tada:

kgoldfeld commented 1 year ago

I agree that the result is not ideal, and I agree that it should be changed. I do have concerns that it might impact some users who have learned to live with it.

As an aside, if you use trtAssign as a distribution in a dataDef, the results are more what you would expect:

d <- defData(varname = "tx", formula = "1;3", dist = "trtAssign")
genData(1000, d)[, table(tx)]
tx
  0   1 
250 750 
d <- defData(varname = "tx", formula = "1;2;3", dist = "trtAssign")
genData(1000, d)[, table(tx)]
tx
  1   2   3 
167 334 499 
kgoldfeld commented 1 year ago

I just want to make clear what would be the ideal behavior. It seems to me that with two categories, the result should always be 0/1, and never 1/2. @maxdrohde Is that what you were thinking as well?

maxdrohde commented 1 year ago

@kgoldfeld Yes, just using 0/1 sounds good to me. My main concern was just that it wasn't consistent. Thanks for looking into this!

kgoldfeld commented 1 year ago

@maxdrohde Just wanted to let you know that the behavior of trtAssign behavior is now consistent so that 0/1 is generated with 2 treatment arms, but 1/2/3/... is used with more than 2 arms. The changes are available in the development version here on github.