insightsengineering / teal

Exploratory Web Apps for Analyzing Clinical Trial Data
https://insightsengineering.github.io/teal/
Other
181 stars 39 forks source link

Show R Code is missing some codes #1088

Closed donyunardi closed 9 months ago

donyunardi commented 9 months ago

Steps to reproduce:

  1. Go to https://genentech.shinyapps.io/nest_early-dev_dev/
  2. Open the Waterfall Plot module
  3. Click Show R Code
  4. Copy the code and run it locally

image

The Show R code ```r # Add any code to install/load your NEST environment here library(shiny) library(teal.code) library(teal.data) library(teal.slice) library(teal) library(magrittr) library(teal.transform) library(formatters) library(rtables) library(tern) library(teal.modules.clinical) library(ggplot2) library(ggmosaic) library(shinyTree) library(teal.modules.general) library(dplyr) library(osprey) library(teal.osprey) library(scda) library(scda.2022) library(nestcolor) library(sparkline) library(scda) library(scda.2022) library(dplyr) library(nestcolor) library(sparkline) ADSL <- synthetic_cdisc_data("latest")$adsl adsl_labels <- teal.data::col_labels(ADSL, fill = FALSE) ADSL <- ADSL %>% mutate(TRTDURD = as.numeric(as.Date(TRTEDTM) - as.Date(TRTSDTM)) + 1, DTHFL = ifelse(!is.na(DTHDT), "Y", NA), EOSSTT = factor(EOSSTT, levels = c("COMPLETED", "ONGOING", "DISCONTINUED"))) %>% teal.data::col_relabel(TRTDURD = "Treatment Duration in Days", DTHFL = "Death Flag", DCSREAS = "Reason for Study Discontinuation", EOSSTT = "End of Study Status") %>% droplevels() teal.data::col_labels(ADSL)[c(names(adsl_labels))] <- adsl_labels stopifnot(rlang::hash(ADSL) == "30e7c11581bd01098a8be28263ef4995") ADAE <- synthetic_cdisc_data("latest")$adae adae_labels <- teal.data::col_labels(ADAE, fill = FALSE) ADAE <- ADAE %>% mutate_at(c("AESOC", "AEBODSYS", "AEHLT", "AEDECOD", "AETERM", "AELLT"), as.character) %>% mutate(RELFL = ifelse(AEREL == "Y", "Y", "N"), CTC35FL = ifelse(AETOXGR %in% c("3", "4", "5"), "Y", "N"), SERFL = ifelse(AESER == "Y", "Y", "N"), RELSERFL = ifelse(AEREL == "Y" & AESER == "Y", "Y", "N"), AEREL1 = (AEREL == "Y" & ACTARM == "A: Drug X"), AEREL2 = (AEREL == "Y" & ACTARM == "B: Placebo"), ASTDT = as.Date(ASTDTM), AENDT = as.Date(AENDTM)) %>% teal.data::col_relabel(RELFL = "Related AE", CTC35FL = "Grade >=3 AE", SERFL = "Serious AE", RELSERFL = "Related Serious AE", AEREL1 = "AE related to A: Drug X", AEREL2 = "AE related to B: Placebo", ASTDT = "Analysis Start Date", AENDT = "Analysis End Date", AESOC = "Primary System Organ Class", AEBODSYS = "Body System or Organ Class", AEHLT = "High Level Term", AEDECOD = "Dictionary-Derived Term", AETERM = "Reported Term for the Adverse Event", AELLT = "Lowest Level Term") ADCM <- synthetic_cdisc_data("latest")$adcm ADCM <- ADCM %>% mutate(ASTDT = as.Date(ASTDTM), AENDT = as.Date(AENDTM)) %>% teal.data::col_relabel(ASTDT = "Analysis Start Date", AENDT = "Analysis End Date") ADEX <- synthetic_cdisc_data("latest")$adex ADEX <- ADEX %>% mutate(ASTDT = as.Date(ASTDTM), AENDT = as.Date(AENDTM)) %>% teal.data::col_relabel(ASTDT = "Analysis Start Date", AENDT = "Analysis End Date") ADTR <- synthetic_cdisc_data("latest")$adtr ADTR <- ADTR %>% mutate(PCHG = ifelse(AVISIT == "BASELINE", 0, PCHG), CHG = ifelse(AVISIT == "BASELINE", 0, CHG), AVAL = ifelse(AVISIT == "BASELINE", BASE, AVAL), AVALC = ifelse(AVISIT == "BASELINE", as.character(BASE), AVALC)) %>% filter(AVISIT != "SCREENING") ADTRWF <- ADTR %>% filter(AVISIT != "BASELINE") ADRS <- synthetic_cdisc_data("latest")$adrs ADRS <- ADRS %>% filter(PARAMCD %in% c("BESRSPI", "INVET")) %>% mutate(ADT = as.Date(ADTM)) %>% droplevels() ADRSSWIM <- synthetic_cdisc_data("latest")$adrs %>% filter(PARAMCD == "OVRINV") %>% arrange(USUBJID) ADLB <- synthetic_cdisc_data("latest")$adlb ADLB <- ADLB %>% mutate(ADT = as.Date(ADTM), LBSTRESN = as.numeric(gsub("[^0-9]", "", LBSTRESC))) %>% teal.data::col_relabel(ADT = "Analysis Date", LBSTRESN = "Numeric Result/Finding in Standard Units") stopifnot(rlang::hash(ADSL) == "30e7c11581bd01098a8be28263ef4995") stopifnot(rlang::hash(ADAE) == "053de441150756031658354bcb4c5e1f") stopifnot(rlang::hash(ADCM) == "39870731aa07391109532de3de502f83") stopifnot(rlang::hash(ADEX) == "6887d53d4fb51715bca394d3a1be3df0") stopifnot(rlang::hash(ADTR) == "911e50295b9da1b00748237c59a7db0d") stopifnot(rlang::hash(ADTRWF) == "da4e59e3d3784139806daf266eb62a03") stopifnot(rlang::hash(ADRS) == "f62e0c9c40b93e46a4237df3420e4063") stopifnot(rlang::hash(ADRSSWIM) == "c908612cd04930bc523dd7af9e336b4d") stopifnot(rlang::hash(ADLB) == "df08ddc6da6f17adab4996f99c5ae519") ADAE <- dplyr::inner_join(x = ADAE, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADCM <- dplyr::inner_join(x = ADCM, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADEX <- dplyr::inner_join(x = ADEX, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADTR <- dplyr::inner_join(x = ADTR, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADTRWF <- dplyr::inner_join(x = ADTRWF, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADRS <- dplyr::inner_join(x = ADRS, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADRSSWIM <- dplyr::inner_join(x = ADRSSWIM, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) ADLB <- dplyr::inner_join(x = ADLB, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID")) bar_var <- "PCHG" bar_color_var <- "ARMCD" sort_var <- NULL add_label_var_sl <- NULL add_label_paramcd_rs <- "BESRSPI" anno_txt_var_sl <- NULL anno_txt_paramcd_rs <- "BESRSPI" facet_var <- NULL href_line <- c(-30, 20) gap_point_val <- NULL show_value <- TRUE adsl <- ADSL[, c("USUBJID", "STUDYID", "ARMCD")] adtr <- ADTRWF[, c("USUBJID", "STUDYID", "PARAMCD", "PCHG")] adrs <- ADRS[, c("USUBJID", "STUDYID", "PARAMCD", "AVALC")] bar_tr <- ADTRWF %>% dplyr::filter(PARAMCD == "SLDINV") %>% dplyr::select(USUBJID, PCHG) %>% dplyr::group_by(USUBJID) %>% dplyr::slice(which.min(PCHG)) bar_data <- adsl %>% dplyr::inner_join(bar_tr, "USUBJID") rs_sub <- ADRS %>% dplyr::filter(PARAMCD %in% "BESRSPI") rs_label <- rs_sub %>% dplyr::select(USUBJID, PARAMCD, AVALC) %>% tidyr::pivot_wider(names_from = PARAMCD, values_from = AVALC) anl <- bar_data %>% dplyr::left_join(rs_label, by = c("USUBJID")) anl$USUBJID <- unlist(lapply(strsplit(anl$USUBJID, "-", fixed = TRUE), tail, 1)) plot <- osprey::g_waterfall(bar_id = anl[["USUBJID"]], bar_height = anl[[bar_var]], sort_by = NULL, col_by = anl[[bar_color_var]], bar_color_opt = NULL, anno_txt = data.frame(anl[anno_txt_paramcd_rs]), href_line = c(-30, 20), facet_by = NULL, show_datavalue = TRUE, add_label = anl[[add_label_paramcd_rs]], gap_point = NULL, ytick_at = 20, y_label = "Tumor Burden Change from Baseline", title = "Waterfall Plot") plot ```
sessionInfo() ```r R version 4.3.2 (2023-10-31) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.6.3 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sparkline_2.0 nestcolor_0.1.2 [3] scda.2022_0.1.5.9005 scda_0.1.6.9015 [5] teal.osprey_0.1.16.9008 osprey_0.1.16.9001 [7] dplyr_1.1.4 teal.modules.general_0.2.16.9018 [9] shinyTree_0.3.1 ggmosaic_0.3.3 [11] ggplot2_3.4.4 teal.modules.clinical_0.8.16.9079 [13] tern_0.9.3.9011 rtables_0.6.6.9006 [15] formatters_0.5.5.9007 teal.transform_0.4.0.9015 [17] magrittr_2.0.3 teal_0.14.0.9042 [19] teal.slice_0.4.0.9040 teal.data_0.4.0.9002 [21] teal.code_0.5.0.9002 shiny_1.8.0 loaded via a namespace (and not attached): [1] tidyselect_1.2.0 viridisLite_0.4.2 fastmap_1.1.1 [4] lazyeval_0.2.2 TH.data_1.1-2 promises_1.2.1 [7] digest_0.6.33 estimability_1.4.1 mime_0.12 [10] tern.gee_0.1.3.9001 lifecycle_1.0.4 ellipsis_0.3.2 [13] survival_3.5-7 compiler_4.3.2 rlang_1.1.2 [16] tools_4.3.2 utf8_1.2.4 data.table_1.14.10 [19] htmlwidgets_1.6.4 multcomp_1.4-25 withr_2.5.2 [22] purrr_1.0.2 geepack_1.3.9 grid_4.3.2 [25] fansi_1.0.6 teal.logger_0.1.3.9010 xtable_1.8-4 [28] colorspace_2.1-0 emmeans_1.8.8 scales_1.3.0 [31] MASS_7.3-60 cli_3.6.2 mvtnorm_1.2-2 [34] generics_0.1.3 rstudioapi_0.15.0 httr_1.4.7 [37] stringr_1.5.1 splines_4.3.2 assertthat_0.2.1 [40] vctrs_0.6.5 Matrix_1.6-1.1 sandwich_3.0-2 [43] jsonlite_1.8.8 ggrepel_0.9.3 plotly_4.10.2 [46] tidyr_1.3.0 glue_1.6.2 codetools_0.2-19 [49] stringi_1.8.3 gtable_0.3.4 later_1.3.2 [52] munsell_0.5.0 tibble_3.2.1 logger_0.2.2 [55] pillar_1.9.0 htmltools_0.5.7 R6_2.5.1 [58] Rdpack_2.5 lattice_0.21-9 rbibutils_2.2.15 [61] backports_1.4.1 broom_1.0.5 httpuv_1.6.13 [64] Rcpp_1.0.11 nlme_3.1-163 checkmate_2.3.1 [67] zoo_1.8-12 pkgconfig_2.0.3 ```

Issue

The problem is that the Show R Code is missing some codes. For example, in ADSL, it's missing

adsl_labels <- teal.data::col_labels(ADSL, fill = FALSE)

and

teal.data::col_labels(ADSL)[c(names(adsl_labels))] <- adsl_labels

adsl_labels is not mentioned in app.R when defining the datanames: https://github.com/insightsengineering/teal.gallery/blob/bc3a1a54863bd301c71df4c7a848a9a27208b11e/early-dev/app.R#L143-L144

However, if I removed this line and let teal.data built the datanames, I still don't see the missing codes because of the current design in .datasets_to_data, specifically, this line: https://github.com/insightsengineering/teal/blob/a7040006853f799245d6cfbcc0a9eafa93fe6a3d/R/module_nested_tabs.R#L253-L256 and https://github.com/insightsengineering/teal/blob/a7040006853f799245d6cfbcc0a9eafa93fe6a3d/R/module_nested_tabs.R#L267 When a module is loaded, it will only get the code for the datanames used in the module and adsl_labels is not part of any module's datasets.

I believe this is the result of trying to pull the code only related to the module's datasets so we don't have repeated code in Show R Code.

Temporarily, we should update teal.gallery to make sure the Show R Code works.

chlebowa commented 9 months ago

adsl_labels is not supposed to be in @datanames because the slot only lists datasets and not auxiliary objects. adsl_labels not being listed in @datanames is exactly the reason there is a @datanames. The issue seems to be that get_code_dependency does not recognize ADSL in teal.data::col_labels(ADSL)[c(names(adsl_labels))] <- adsl_labels. The other line is straightforward and doesn't seem like it should pose a problem..

m7pr commented 9 months ago

Will take a look. I believe it's something off in get_code_dependency in teal.data. If we find a fix for it today, maybe we can resubmit teal.data to CRAN? Should be faster as now the process is automated and there is no manual review.

m7pr commented 9 months ago

Yes, it's the ADSL not being recognized in

teal.data::col_labels(ADSL)[c(names(adsl_labels))] <- adsl_labels

hence above line, and the creation of adsl_labels is not returned.

Making this example simpler:

code <- '
  data(iris)
  names(iris) <- letters[1:5]
'
tdata <- teal_data(code = code)
cat(get_code(tdata, datanames = 'iris'))

In above names(iris) <- letters[1:5] should be returned but only data(iris) is returned.

m7pr commented 9 months ago

Hey, proposed a fix in here https://github.com/insightsengineering/teal.data/pull/289