insightsengineering / rbmi

Reference based multiple imputation R package
https://insightsengineering.github.io/rbmi/
Other
17 stars 6 forks source link

Default values for impute's references #286

Open gowerc opened 2 years ago

gowerc commented 2 years ago

I was wondering whether we should provide default values for references in impute ? As in I imagine in 90% of cases people are just going to do c("placebo" = "placebo", "treatment" = "placebo") so I'm wondering if we should just automatically infer this based upon the labels of the group variable if nothing is passed to the arguement (just makes it slightly easier for users I am thinking)

i.e. (pseudo code)

if  references is null:
    labels = levels(groups)
    references  =  c(
        labels[1] = labels[1],  
        labels[2] = labels[1]
    )
gowerc commented 2 years ago

@wolbersm & @nociale

nociale commented 2 years ago

@gowerc How is it possible to understand which level of vars$group refers to control and which to intervention, without any specification from the user?

I agree most of cases will be "Control" = "Control", "Intervention" = "Control" but I think it is easy and clear for the user.

wolbersm commented 2 years ago

@gowerc I agree with @nociale and would prefer no default.

To me, the only setting where we should use default values for reference is the case where all ICEs are imputed under "MAR" and in this case, reference should be c("group1"="group1","group2"="group2",etc.)

gowerc commented 2 years ago

I was thinking of just using standard factor notation where the first level is considered to be the reference/control.

i.e.

if (length(levels(group)) > 2 ) stop("too many groups to automatically decide")
lvls <- levels(group)
references <- lvls
names(references) <- c(lvls[1], lvls[1])

I just figured that this likely covers 90% of real world usecase so would make sense to default to it to save users some hassle. No worries if you feel its more likely to add complications / confusion

wolbersm commented 2 years ago

Easy enough but I still feel it's better to force the user to provide the reference manually whenever reference-based imputation is used.
For example, if for some reason the first level of the factor is the intervention (which I guess can happen if you call them "intervention" and "placebo" and just naively use factor which uses alphabetical ordering), then this gives completely wrong results but this is not totally obvious to recognize from the outputs.