Function to iterate over categorical variable values and make new data sets

atredennick commented 10 months ago

Not sure if this is too "inside baseball" to be relevant to a wider community. Also not sure if the best place for this is pmforest or yspec. For making forest plots, we generally are simulating from a fitted model given new data sets where one variable is changed per iteration. A function that takes in a spec and vector of variables, scans all possible values, and generates new datasets for reference and univariate perturbations would be extremely helpful.

barrettk commented 10 months ago

Hey @atredennick, just wondering if you had any scripts or illustrations of doing some of what you're asking for? We could definitely do something like that, but will talk to some other TS folks to see where something like this should live

atredennick commented 9 months ago

Not sure if this is super helpful out of context, but here is what I cobbled together recently:

process_cats <- function(.cats, .conts, .spec = spec) {
  cat_codes <- tibble(variable = .cats) %>%
    mutate(decodes = map(.x = variable, .f = ~ get_values(.var = .x))) %>%
    unnest(cols = decodes) %>%
    filter(code != "Missing") %>%
    group_by(variable) %>%
    arrange(variable, value) %>%
    mutate(case = case_when(
      value == min(value) ~ "reference",
      TRUE ~ "perturbation"
    )) %>%
    ungroup()

  ref_cats <- cat_codes %>%
    filter(case == "reference") %>%
    dplyr::select(-case) %>%
    nest(ref_df = c(variable, value, code))

  pert_cats <- cat_codes %>%
    filter(case == "perturbation") %>%
    dplyr::select(-case) %>%
    mutate(pert_value = value) %>%
    nest(pert_df = c(value, code))

  cat_dfs <- pert_cats %>%
    crossing(ref_cats)

  # need to remove rows where variable is TRT2 and pert_value is 1
  # because this is never seen in the training data
  cat_dfs <- cat_dfs %>%
    filter(!(variable == "TRT2" & pert_value == 1))

  return(cat_dfs)
}

Followed by this function:

set_perturbations <- function(.var, .ref, .pert) {
  new_row <- tibble(variable = .var, value = .pert$value, code = .pert$code)
  out <- .ref %>%
    filter(variable != .var) %>%
    bind_rows(new_row) %>%
    arrange(variable) %>%
    dplyr::select(-code) %>%
    pivot_wider(names_from = variable, values_from = value)
  return(out)
}

barrettk commented 9 months ago

@atredennick thanks so much for putting that together! Seth proposed the idea of making a PR on the example project, to show how it could look there first. If you're able to do that let me know!

atredennick commented 9 months ago

Sounds good! Will do. (Might take a few days).

barrettk commented 9 months ago

Hey @atredennick, just wanted to check back in to see if you had any status updates?

atredennick commented 8 months ago

Actually, for a recent project, Todd has developed some functions for this.

barrettk commented 8 months ago

@atredennick Is this internal function sufficient, or do you think a package function would still be ideal? Would love to look at it if it can be ported over easily!

metrumresearchgroup / pmforest

Function to iterate over categorical variable values and make new data sets #36