pharmaverse / admiral

ADaM in R Asset Library
https://pharmaverse.github.io/admiral
Apache License 2.0
222 stars 62 forks source link

Refactor for admiral's use within GSK's ADaM metadata driven system #2441

Closed bms63 closed 4 months ago

bms63 commented 4 months ago

Discussed in https://github.com/pharmaverse/admiral/discussions/2349

Originally posted by **bms63** February 23, 2024 Why some existing admiral functions are not a good fit to be used in Executor Thursday, February 22, 2024 1:33 PM Executor is designed to control the inputs and outputs of each derivation, so that when selecting multiple derivations from the library the system can recognize the order of their execution automatically. Information about inputs and outputs of each derivation is collected as metadata from the user and in order to ensure consistency between what is specified in the metadata and what is actually being used in the derivation the frame work performs the initial filtering and joining of the source data as well as joining the result on to output dataset itself, taking this actions off the user's table. This makes functions like the following: ![image](https://github.com/pharmaverse/admiral/assets/10111024/9d179af8-c9eb-4074-934a-0f2f4e0e01ed) and other functions utilizing derive_vars_merged function under the hood not useful with Executor. Proposal: For each function using derive_vars_merged abstract the part before the derive_vars_merged call into a separate function and call is from the main function. e.g. ``` derive_var_merged_exist_flag <- function(dataset, dataset_add, by_vars, new_var, condition, true_value = "Y", false_value = NA_character_, missing_value = NA_character_, filter_add = NULL) { new_var <- assert_symbol(enexpr(new_var)) condition <- assert_filter_cond(enexpr(condition)) filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE) add_data <- filter_if(dataset_add, filter_add) %>% mutate(!!new_var := if_else(!!condition, 1, 0, 0)) derive_vars_merged( dataset, dataset_add = add_data, by_vars = by_vars, new_vars = exprs(!!new_var), order = exprs(!!new_var), check_type = "none", mode = "last" ) %>% mutate(!!new_var := if_else(!!new_var == 1, true_value, false_value, missing_value)) } ``` Turned into: ``` exist_flag <- function(df, new_var, condition, true_value = "Y", false_value = NA_character_, missing_value = NA_character_, filter_add = NULL ) { new_var <- assert_symbol(enexpr(new_var)) condition <- assert_filter_cond(enexpr(condition)) filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE) add_data <- filter_if(dataset_add, filter_add) %>% mutate(!!new_var := if_else(!!condition, 1, 0, 0)) } derive_var_merged_exist_flag <- function(dataset, dataset_add, by_vars, new_var, condition, true_value = "Y", false_value = NA_character_, missing_value = NA_character_, filter_add = NULL) { add_data <- exist_flag(df, new_var, condition, true_value, false_value, missing_value, filter_add) derive_vars_merged( dataset, dataset_add = add_data, by_vars = by_vars, new_vars = exprs(!!new_var), order = exprs(!!new_var), check_type = "none", mode = "last" ) %>% mutate(!!new_var := if_else(!!new_var == 1, true_value, false_value, missing_value)) } ``` This will allow users, who are not keen on directly using merging onto output dataset make use of admiral functionality without maintaining a separate fork of the repo. The proposal is that: • Admiral or GSK Mercury team implements the proposed change and raises as PR in admiral repository • Admiral team prioritizes the review of the PR and release to make sure Mercury can meet our timelines (release end of Q2 latest) • Implementing abstracting functions like exist_flag becomes standard practice in admiral Alexey