[Bug]: `{nest_exploratory_dev}` `Error: rlang::hash(ADSL) == "843e317c3d4aeb88062cd39a9c62fe8a" is not TRUE`

m7pr commented 10 months ago

What happened?

Go here https://genentech.shinyapps.io/nest_exploratory_dev/ -> Missing Data panel. Select Add **anyna** variable (there might be an error even when you dont select it). Select Add summary per patients (there might be an error even when you dont select it). Click Show R Code.

Code is not runnable

> stopifnot(rlang::hash(ADSL) == "843e317c3d4aeb88062cd39a9c62fe8a")
Error: rlang::hash(ADSL) == "843e317c3d4aeb88062cd39a9c62fe8a" is not TRUE

# Add any code to install/load your NEST environment here

library(shiny)
library(ggplot2)
library(ggmosaic)
library(shinyTree)
library(teal.code)
library(teal.data)
library(teal.slice)
library(teal)
library(magrittr)
library(teal.transform)
library(teal.modules.general)
library(scda)
library(scda.2022)
library(dplyr)
library(tidyr)
library(ggExtra)
library(ggpp)
library(ggpmisc)
library(goftest)
library(gridExtra)
library(htmlwidgets)
library(jsonlite)
library(lattice)
library(MASS)
library(rlang)
library(formatters)
library(rtables)
library(nestcolor)
library(broom)
library(colourpicker)
library(sparkline)

library(scda)
library(scda.2022)
library(dplyr)
library(tidyr)
library(ggExtra)
library(ggpmisc)
library(ggpp)
library(goftest)
library(gridExtra)
library(htmlwidgets)
library(jsonlite)
library(lattice)
library(MASS)
library(rlang)
library(rtables)
library(nestcolor)
library(broom)
library(colourpicker)
library(sparkline)
ADSL <- synthetic_cdisc_data("latest")$adsl
ADRS <- synthetic_cdisc_data("latest")$adrs
ADLB <- synthetic_cdisc_data("latest")$adlb
ADLBPCA <- ADLB %>% dplyr::select(USUBJID, STUDYID, SEX, ARMCD, AVAL, AVISIT, PARAMCD) %>% tidyr::pivot_wider(values_from = "AVAL", names_from = c("PARAMCD", "AVISIT"), names_sep = " - ")

stopifnot(rlang::hash(ADSL) == "843e317c3d4aeb88062cd39a9c62fe8a")
stopifnot(rlang::hash(ADRS) == "601b33239cb66543a69dc57dc923e3f9")
stopifnot(rlang::hash(ADLB) == "b237c364adf16523f5db5bb2035409d9")
stopifnot(rlang::hash(ADLBPCA) == "5b30f523f017d178c7b0695346e9b3f5")

ADSL <- dplyr::filter(ADSL, ACTARM %in% c("A: Drug X", "C: Combination"))
ADRS <- dplyr::filter(ADRS, AVALC %in% c("CR", "SD", "NE"))
ADRS <- dplyr::inner_join(x = ADRS, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID"))
ADLB <- dplyr::inner_join(x = ADLB, y = ADSL[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID"))

ANL <- ADSL
create_cols_labels <- function(cols, just_label = FALSE) {
    column_labels <- c(STUDYID = "Study Identifier", USUBJID = "Unique Subject Identifier", ADTHAUT = "Autopsy Performed", DTHDT = "Date of Death", DTHCAUS = "Cause of Death", DTHCAT = "Cause of Death Category", LDDTHELD = "Elapsed Days from Last Dose to Death", LDDTHGR1 = "Last Dose to Death - Days Elapsed Grp 1", DTHADY = "Relative Day of Death", DCSREAS = "Reason for Discontinuation from Study", TRTEDTM = "Datetime of Last Exposure to Treatment", TRT01EDTM = "Datetime of Last Exposure in Period 01", 
    TRT02SDTM = "Datetime of First Exposure to Treatment in Period 02", TRT02EDTM = "Datetime of Last Exposure to Treatment in Period 02", AP01EDTM = "Period 01 End Datetime", AP02SDTM = "Period 02 Start Datetime", AP02EDTM = "Period 02 End Datetime", EOSDT = "End of Study Date", EOSDY = "End of Study Relative Day", LSTALVDT = "Date Last Known Alive", SUBJID = "Subject Identifier for the Study", SITEID = "Study Site Identifier", AGE = "Age", AGEU = "Age Units", SEX = "Sex", RACE = "Race", ETHNIC = "Ethnicity", 
    COUNTRY = "Country", DTHFL = "Subject Death Flag", INVID = "Investigator Identifier", INVNAM = "Investigator Name", ARM = "Description of Planned Arm", ARMCD = "Planned Arm Code", ACTARM = "Description of Actual Arm", ACTARMCD = "Actual Arm Code", TRT01P = "Planned Treatment for Period 01", TRT01A = "Actual Treatment for Period 01", TRT02P = "Planned Treatment for Period 02", TRT02A = "Actual Treatment for Period 02", REGION1 = "Geographic Region 1", STRATA1 = "Stratification Factor 1", STRATA2 = "Stratification Factor 2", 
    BMRKR1 = "Continuous Level Biomarker 1", BMRKR2 = "Categorical Level Biomarker 2", ITTFL = "Intent-To-Treat Population Flag", SAFFL = "Safety Population Flag", BMEASIFL = "Response Evaluable Population Flag", BEP01FL = "Biomarker Evaluable Population Flag", AEWITHFL = "AE Leading to Drug Withdrawal Flag", RANDDT = "Date of Randomization", TRTSDTM = "Datetime of First Exposure to Treatment", TRT01SDTM = "Datetime of First Exposure to Treatment in Period 01", AP01SDTM = "Period 01 Start Datetime", 
    EOSSTT = "End of Study Status", EOTSTT = "End of Treatment Status", new_col_name = "**anyna**")
    column_labels[is.na(column_labels) | length(column_labels) == 0] <- ""
    if (just_label) {
        labels <- column_labels[cols]
    }
    else {
        labels <- ifelse(cols == "**anyna**" | cols == "", cols, paste0(column_labels[cols], " [", cols, "]"))
    }
    return(labels)
}
ANL[["**anyna**"]] <- ifelse(rowSums(is.na(ANL)) > 0, NA, FALSE)
analysis_vars <- setdiff(colnames(ANL), c(ADSL.STUDYID = "STUDYID", ADSL.USUBJID = "USUBJID", ADLB.STUDYID = "STUDYID", ADLB.USUBJID = "USUBJID", ADRS.STUDYID = "STUDYID", ADRS.USUBJID = "USUBJID"))
summary_plot_obs <- ANL[, analysis_vars] %>% dplyr::summarise_all(list(function(x) sum(is.na(x)))) %>% tidyr::pivot_longer(tidyselect::everything(), names_to = "col", values_to = "n_na") %>% dplyr::mutate(n_not_na = nrow(ANL) - n_na) %>% tidyr::pivot_longer(-col, names_to = "isna", values_to = "n") %>% dplyr::mutate(isna = isna == "n_na", n_pct = n/nrow(ANL) * 100)
x_levels <- dplyr::filter(summary_plot_obs, isna) %>% dplyr::arrange(n_pct, dplyr::desc(col)) %>% dplyr::pull(col) %>% create_cols_labels()
x_levels <- c(setdiff(x_levels, "**anyna**"), "**anyna**")
p1 <- summary_plot_obs %>% ggplot() + aes(x = factor(create_cols_labels(col), levels = x_levels), y = n_pct, fill = isna) + geom_bar(position = "fill", stat = "identity") + scale_fill_manual(name = "", values = c("grey90", c(getOption("ggplot2.discrete.colour")[2], "#ff2951ff")[1]), labels = c("Present", "Missing")) + scale_y_continuous(labels = scales::percent_format(), breaks = seq(0, 1, by = 0.1), expand = c(0, 0)) + geom_text(aes(label = ifelse(isna == TRUE, sprintf("%d [%.02f%%]", n, n_pct), 
    ""), y = 1), hjust = 1, color = "black") + ggplot2::labs(caption = "NEST PROJECT", x = "Variable", y = "Missing observations") + ggplot2::theme_classic() + ggplot2::theme(legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1)) + coord_flip()
parent_keys <- NULL
ndistinct_subjects <- dplyr::n_distinct(ANL[, parent_keys])
summary_plot_patients <- ANL[, c(parent_keys, analysis_vars)] %>% dplyr::group_by_at(parent_keys) %>% dplyr::summarise_all(anyNA) %>% tidyr::pivot_longer(cols = !tidyselect::all_of(parent_keys), names_to = "col", values_to = "anyna") %>% dplyr::group_by_at(c("col")) %>% dplyr::summarise(count_na = sum(anyna)) %>% dplyr::mutate(count_not_na = ndistinct_subjects - count_na) %>% tidyr::pivot_longer(-c(col), names_to = "isna", values_to = "n") %>% dplyr::mutate(isna = isna == "count_na", n_pct = n/ndistinct_subjects * 
    100) %>% dplyr::arrange_at(c("isna", "n"), .funs = dplyr::desc)
p2 <- summary_plot_patients %>% ggplot() + aes_(x = ~factor(create_cols_labels(col), levels = x_levels), y = ~n_pct, fill = ~isna) + geom_bar(alpha = 1, stat = "identity", position = "fill") + scale_y_continuous(labels = scales::percent_format(), breaks = seq(0, 1, by = 0.1), expand = c(0, 0)) + scale_fill_manual(name = "", values = c("grey90", c(getOption("ggplot2.discrete.colour")[2], "#ff2951ff")[1]), labels = c("Present", "Missing")) + geom_text(aes(label = ifelse(isna == TRUE, sprintf("%d [%.02f%%]", 
    n, n_pct), ""), y = 1), hjust = 1, color = "black") + ggplot2::labs(caption = "NEST PROJECT", x = "", y = "Missing patients") + ggplot2::theme_classic() + ggplot2::theme(legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1), axis.text.y = element_blank()) + coord_flip()
g1 <- ggplotGrob(p1)
g2 <- ggplotGrob(p2)
g <- gridExtra::gtable_cbind(g1, g2, size = "first")
g$heights <- grid::unit.pmax(g1$heights, g2$heights)
grid::grid.newpage()
grid::grid.draw(g)

Relevant log output

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct.

Contribution Guidelines

[X] I agree to follow this project's Contribution Guidelines.

Security Policy

[X] I agree to follow this project's Security Policy.

donyunardi commented 9 months ago

I can't reproduce this error as I am able to run the code and get the visualization locally.

Could you provide sessionInfo? I need to make sure you have the same scda version with the one running in _dev.

Here's mine

```r R version 4.3.2 (2023-10-31) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.6.3 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sparkline_2.0 colourpicker_1.3.0 [3] broom_1.0.5 nestcolor_0.1.2 [5] rtables_0.6.6.9006 formatters_0.5.5.9007 [7] rlang_1.1.2 MASS_7.3-60 [9] lattice_0.21-9 jsonlite_1.8.8 [11] htmlwidgets_1.6.4 gridExtra_2.3 [13] goftest_1.2-3 ggpmisc_0.5.4-1 [15] ggpp_0.5.4 ggExtra_0.10.1 [17] tidyr_1.3.0 dplyr_1.1.4 [19] scda.2022_0.1.5.9005 scda_0.1.6.9015 [21] teal.modules.general_0.2.16.9020 teal.transform_0.4.0.9017 [23] magrittr_2.0.3 teal_0.15.0.9002 [25] teal.slice_0.5.0.9001 teal.data_0.4.0.9005 [27] teal.code_0.5.0.9003 shinyTree_0.3.1 [29] ggmosaic_0.3.3 ggplot2_3.4.4 [31] shiny_1.8.0 loaded via a namespace (and not attached): [1] gtable_0.3.4 ggrepel_0.9.3 vctrs_0.6.5 [4] tools_4.3.2 generics_0.1.3 tibble_3.2.1 [7] fansi_1.0.6 pkgconfig_2.0.3 Matrix_1.6-1.1 [10] checkmate_2.3.1 data.table_1.14.10 lifecycle_1.0.4 [13] farver_2.1.1 compiler_4.3.2 stringr_1.5.1 [16] MatrixModels_0.5-2 munsell_0.5.0 SparseM_1.81 [19] httpuv_1.6.13 quantreg_5.97 htmltools_0.5.7 [22] lazyeval_0.2.2 plotly_4.10.2 crayon_1.5.2 [25] later_1.3.2 pillar_1.9.0 ellipsis_0.3.2 [28] mime_0.12 tidyselect_1.2.0 digest_0.6.33 [31] stringi_1.8.3 purrr_1.0.2 splines_4.3.2 [34] fastmap_1.1.1 grid_4.3.2 colorspace_2.1-0 [37] cli_3.6.2 logger_0.2.2 survival_3.5-7 [40] utf8_1.2.4 withr_2.5.2 backports_1.4.1 [43] scales_1.3.0 promises_1.2.1 httr_1.4.7 [46] miniUI_0.1.1.1 viridisLite_0.4.2 Rcpp_1.0.11 [49] xtable_1.8-4 glue_1.6.2 polynom_1.4-1 [52] rstudioapi_0.15.0 teal.logger_0.1.3.9011 R6_2.5.1 ```

m7pr commented 9 months ago

@donyunardi I don't say I'm not getting the visualization when I bypass the code and omit the below part

stopifnot(rlang::hash(ADSL) == "843e317c3d4aeb88062cd39a9c62fe8a")
stopifnot(rlang::hash(ADRS) == "601b33239cb66543a69dc57dc923e3f9")
stopifnot(rlang::hash(ADLB) == "b237c364adf16523f5db5bb2035409d9")
stopifnot(rlang::hash(ADLBPCA) == "5b30f523f017d178c7b0695346e9b3f5")

Do you get errors for this part?

For scda - this package was not changed in the software part for a year so I doubt it's the reason of this.

My session info is below:

Session Info

```r R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621) Matrix products: default locale: [1] LC_COLLATE=English_Europe.utf8 [2] LC_CTYPE=English_Europe.utf8 [3] LC_MONETARY=English_Europe.utf8 [4] LC_NUMERIC=C [5] LC_TIME=English_Europe.utf8 time zone: Europe/Berlin tzcode source: internal attached base packages: [1] stats graphics grDevices utils [5] datasets methods base other attached packages: [1] sparkline_2.0 [2] colourpicker_1.2.0 [3] broom_1.0.5 [4] nestcolor_0.1.2.9001 [5] rtables_0.6.6 [6] formatters_0.5.5 [7] rlang_1.1.1 [8] MASS_7.3-58.4 [9] lattice_0.21-8 [10] jsonlite_1.8.7 [11] htmlwidgets_1.6.2 [12] gridExtra_2.3 [13] goftest_1.2-3 [14] ggpmisc_0.5.3 [15] ggpp_0.5.2 [16] ggExtra_0.10.0 [17] tidyr_1.3.0 [18] dplyr_1.1.2 [19] scda.2022_0.1.5.9000 [20] scda_0.1.6.9008 [21] teal.modules.general_0.2.16.9018 [22] teal.transform_0.4.0.9016 [23] magrittr_2.0.3 [24] teal_0.15.0.9002 [25] teal.slice_0.5.0 [26] teal.data_0.4.0.9001 [27] teal.code_0.5.0.9001 [28] shinyTree_0.2.7 [29] ggmosaic_0.3.3 [30] ggplot2_3.4.2 [31] shiny_1.7.5 loaded via a namespace (and not attached): [1] gtable_0.3.3 ggrepel_0.9.3 [3] vctrs_0.6.2 tools_4.3.0 [5] generics_0.1.3 tibble_3.2.1 [7] fansi_1.0.4 pkgconfig_2.0.3 [9] Matrix_1.6-3 checkmate_2.2.0 [11] data.table_1.14.8 lifecycle_1.0.3 [13] farver_2.1.1 compiler_4.3.0 [15] stringr_1.5.0 MatrixModels_0.5-2 [17] munsell_0.5.0 SparseM_1.81 [19] httpuv_1.6.11 quantreg_5.95 [21] htmltools_0.5.5 lazyeval_0.2.2 [23] plotly_4.10.2 crayon_1.5.2 [25] later_1.3.1 pillar_1.9.0 [27] ellipsis_0.3.2 mime_0.12 [29] tidyselect_1.2.0 digest_0.6.33 [31] stringi_1.7.12 purrr_1.0.1 [33] splines_4.3.0 fastmap_1.1.1 [35] grid_4.3.0 colorspace_2.1-0 [37] cli_3.6.1 logger_0.2.2 [39] survival_3.5-5 utf8_1.2.3 [41] withr_2.5.0 backports_1.4.1 [43] scales_1.2.1 promises_1.2.0.1 [45] httr_1.4.7 miniUI_0.1.1.1 [47] viridisLite_0.4.2 Rcpp_1.0.10 [49] xtable_1.8-4 glue_1.6.2 [51] polynom_1.4-1 rstudioapi_0.15.0 [53] teal.logger_0.1.3.9007 R6_2.5.1 ```

m7pr commented 9 months ago

Yours: scda.2022_0.1.5.9005 scda_0.1.6.9015
Mine: scda.2022_0.1.5.9000 scda_0.1.6.9008

Mine looks outdated, but last changes are mainly workflow propagations https://github.com/insightsengineering/scda/commits/main/

m7pr commented 9 months ago

If we add rlangh::hash to the resulted code and the dataset is dependent on a version of scda, should we also add a comment about scda version anywhere? By the way, should we use latest release of scda (0.1.6)?

m7pr commented 9 months ago

Huh, with the updated version on scda an scda.2022 this passes without errors.

Should we put a limit on scda and scda.2022 versions in those apps? So instead of library(scda) we have:

library(scda)
stopifnot('scda version should be higher than 0.16.9010' = packageVersion('scda') > '0.1.6.9010')

?

pawelru commented 9 months ago

A few things here:

scda is a package with accessor functionalities into scda.xyz (e.g. scda.2022)
scda.2022 is the package where datasets are stored

If you have different version of scda.2022 then not surprisingly you might have different dataset objects thus different hash of that object. In order to have a match you have to have the same package versions as used in the app you are referring to (latest release for stable deploy and main for devel deploy). (For simplicity, I'm ignoring maintenance commits that vbumps a given package.)

Should we put a limit on scda and scda.2022 versions in those apps?

In the core functionality? No, definitely not. Not all the apps use scda datasets. In the app code? I don't like it tbh. The line you proposed would become outdated on a new change of scda datasets.

I think you reach a known limitation of ShowRCode functionality is that only the code does not guarantee full reproducibility. You have to have the same environment - package versions, sys deps etc. It's quite unfortunate that this ends up being in the top level code component.

This has been identified already and there is an issue to address this problem that is waiting patiently for its time: https://github.com/insightsengineering/coredev-tasks/issues/479

m7pr commented 9 months ago

Totally! For sure code is not enough for the reproducibility. The environment is the key. That's why I wonder why we do insist of having stopifnot(rlang::hash at all in the Show R Code, as this brings more confusion than benefit. On another hand we were able to troubleshoot the discrepancy between package Versions, so maybe at least we could have a button for Show R Session to repeat this investigation if someone is alone and does not have another person to help and would like to compare his/her own environment with the environment of the app. I think this should be really quick to develop a functionality that shows R session info.

pawelru commented 9 months ago

maybe at least we could have a button for Show R Session to repeat this investigation

How the app would be able to access your local envir? :)

This would have to be a code that you would have to evaluate. That code would have to have app package versions so this would be quite lenghty... IMHO SRC is already too clobbered with library calls. Multiply it by two - ohh gosh nooo... UPDATE: actually library(foo) && stopifnot(packageVersion(foo) == "1.2.3") is not that bad - i.e. move it wider and not append at the bottom UPDATE2: it is actually bad - the goal is to give analysis look alike code but this looks very technical so no

I think this should be really quick to develop a functionality that shows R session info.

You mean to have session info from within the app? It's already there in the footer.

That's why I wonder why we do insist of having stopifnot(rlang::hash at all in the Show R Code, as this brings more confusion than benefit

I don't recall the exact reasons to be honest with you. Probably it's because we wanted to fail fast before the output generation call in case there are differences in the datasets used. If we have different datasets then it's unlikely that we will be having the same outputs.

donyunardi commented 9 months ago

Do you get errors for this part?

No, that part runs just fine.

Probably it's because we wanted to fail fast before the output generation call in case there are differences in the datasets used.

Correct, I do recall that this was the reason we're using the rlang::hash, ensuring the integrity of code reproducibility to match what you see in teal app.

m7pr commented 9 months ago

maybe at least we could have a button for Show R Session to repeat this investigation

How the app would be able to access your local envir? :)

No! I meant checking your own session on your own, and app session info in a place where info is shown. I totally missed there is a session info in the footer.

m7pr commented 9 months ago

In the end this was a discrepancy of package versions. I would be able to see shiny session info in the footer. I don't think this is a bug in the app. Closing.

insightsengineering / teal.gallery