Open atredennick opened 10 months ago
Hey @atredennick, just wondering if you had any scripts or illustrations of doing some of what you're asking for? We could definitely do something like that, but will talk to some other TS folks to see where something like this should live
Not sure if this is super helpful out of context, but here is what I cobbled together recently:
process_cats <- function(.cats, .conts, .spec = spec) {
cat_codes <- tibble(variable = .cats) %>%
mutate(decodes = map(.x = variable, .f = ~ get_values(.var = .x))) %>%
unnest(cols = decodes) %>%
filter(code != "Missing") %>%
group_by(variable) %>%
arrange(variable, value) %>%
mutate(case = case_when(
value == min(value) ~ "reference",
TRUE ~ "perturbation"
)) %>%
ungroup()
ref_cats <- cat_codes %>%
filter(case == "reference") %>%
dplyr::select(-case) %>%
nest(ref_df = c(variable, value, code))
pert_cats <- cat_codes %>%
filter(case == "perturbation") %>%
dplyr::select(-case) %>%
mutate(pert_value = value) %>%
nest(pert_df = c(value, code))
cat_dfs <- pert_cats %>%
crossing(ref_cats)
# need to remove rows where variable is TRT2 and pert_value is 1
# because this is never seen in the training data
cat_dfs <- cat_dfs %>%
filter(!(variable == "TRT2" & pert_value == 1))
return(cat_dfs)
}
Followed by this function:
set_perturbations <- function(.var, .ref, .pert) {
new_row <- tibble(variable = .var, value = .pert$value, code = .pert$code)
out <- .ref %>%
filter(variable != .var) %>%
bind_rows(new_row) %>%
arrange(variable) %>%
dplyr::select(-code) %>%
pivot_wider(names_from = variable, values_from = value)
return(out)
}
@atredennick thanks so much for putting that together! Seth proposed the idea of making a PR on the example project, to show how it could look there first. If you're able to do that let me know!
Sounds good! Will do. (Might take a few days).
Hey @atredennick, just wanted to check back in to see if you had any status updates?
Actually, for a recent project, Todd has developed some functions for this.
@atredennick Is this internal function sufficient, or do you think a package function would still be ideal? Would love to look at it if it can be ported over easily!
Not sure if this is too "inside baseball" to be relevant to a wider community. Also not sure if the best place for this is pmforest or yspec. For making forest plots, we generally are simulating from a fitted model given new data sets where one variable is changed per iteration. A function that takes in a
spec
and vector of variables, scans all possible values, and generates new datasets for reference and univariate perturbations would be extremely helpful.