Return original subjects IDs in the imputed datasets

nociale commented 1 year ago

Would be better if the subjid variable had the same subjects IDs as in the input data

data("antidepressant_data")
dat <- antidepressant_data

dat <- expand_locf(
    dat,
    PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT 
    VISIT = levels(dat$VISIT),
    vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
    group = c("PATIENT"),
    order = c("PATIENT", "VISIT")
)
vars <- set_vars(
    outcome = "CHANGE",
    visit = "VISIT",
    subjid = "PATIENT",
    group = "THERAPY",
    covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
)
method <- method_condmean(type = "bootstrap", n_samples = 0)
drawObj <- draws(
    data = dat,
    data_ice = NULL,
    vars = vars,
    method = method,
    quiet = TRUE
)
imputeObj <- impute(drawObj)
d <- extract_imputed_dfs(imputeObj)[[1]]
head(d$PATIENT) # Original IDs
head(dat$PATIENT) # New IDs

Original IDs:

New IDs:

This would be useful if then one wants to do other analyses and needs the original IDs (e.g. to join two datasets based on the IDs)..

Was it necessary to change the IDs?

gowerc commented 1 year ago

It was essential to change the patient IDs to ensure they were unique when you specify the unstructured covariance matrix as otherwise you would be grouping observations across multiple patients who were sampled from the same original patient. If memory serves me right there is an argument to extract_imputed_dfs() that returns an attribute on the dataframe which can be used to map the new names -> old names.

nociale commented 1 year ago

Indeed, setting the argument idmap = TRUE will return an attribute on the dataframe. This attribute is a named vector that has values equal to the original IDs and names equal to the new IDs.

Easy way to join the original IDs in an imputed dataset:

d <- extract_imputed_dfs(imputeObj, idmap = TRUE)[[1]]
idmap <- attributes(d)$idmap
d$original_id <- idmap[match(d[[vars$subjid]], names(idmap))]

Thanks a lot. I will close this issue.

gowerc commented 1 year ago

@nociale , Have re-opened the issue as I think it might be worth us adding something more explicit about this in one of the vignettes.

gowerc commented 1 year ago

Alternatively, maybe its worth updating the function to add on the "original_id" column instead of just returning the attribute ?

nociale commented 1 year ago

Yes, good idea! We could either (1) set by default idmap = TRUE, or (2) return the "original_id" instead of the modified IDs. If we go with the latter, we could remove the argument idmap if it is not needed anymore. My preference is for (2).

insightsengineering / rbmi

Return original subjects IDs in the imputed datasets #382