insightsengineering / goshawk

Functions that plot and summarize biomarkers/labs of interest
https://insightsengineering.github.io/goshawk/
Other
5 stars 1 forks source link

boxplot: order of color is not respected with ggplot2 v3.3.2 #128

Closed mhallal1 closed 2 years ago

mhallal1 commented 2 years ago

UAT of goshawk app /content/6b17abda-6ac5-458a-9f73-d8c70b565280/

  1. Go to goshawk app
  2. Go boxplot module tab
  3. change the second color to black or green --> the order is not respected. UPDATE: this is the same case in correlation, density correlation and spaghetti plots also.

Screenshot from 2022-04-01 18-09-47

kpagacz commented 2 years ago

I cannot reproduce this locally. It changes the colours back and forth for me (on the plot and the legend).

nikolas-burkoff commented 2 years ago

So on the sample app I see this: image

Running the SRC on BEE: image

But running it locally: image

So I have no idea at all what is happening here.

The key part of the SRC is p <- goshawk::g_boxplot(data = ANL, biomarker = "ALT", xaxis_var = "ARM", yaxis_var = "AVAL", hline_arb = c(10, 30), hline_arb_label = c("A", "B"), hline_arb_color = c("red", "green"),

library(shiny)
library(reticulate)
library(DescTools)
library(magrittr)
library(dplyr)
library(scda)
library(scda.2021)
library(stringr)
library(goshawk)
library(teal.data)
library(teal.transform)
library(teal)
library(teal.goshawk)
library(ggplot2)
library(formatters)
library(rtables)
library(tern)
library(teal.modules.clinical)
library(ggmosaic)
library(shinyTree)
library(teal.modules.general)

shape_manual <- c(N = 1, Y = 2, `NA` = 0)
param_choices <- c("ALT", "CRP", "IGA")
exclude_l2 <- c("")
exclude_chg <- c("")
arm_mapping <- list(`A: Drug X` = "Drug X 100mg", `B: Placebo` = "Placebo", `C: Combination` = "Combination 100mg")
color_manual <- c(`Drug X 100mg` = "#1e90ff", Placebo = "#ffa07a", `Combination 100mg` = "#bb9990")
`%keep_label%` <- function(lhv, rhv) {
  attributes(lhv) <- attributes(rhv)
  lhv
}
`%make_label%` <- function(lhv, label) {
  attr(lhv, "label") <- label
  lhv
}
ADSL <- synthetic_cdisc_data("latest")$adsl %>%
  filter(ITTFL == "Y") %>%
  mutate(TRTORD = case_when(ARMCD == "ARM A" ~ 1, ARMCD == "ARM B" ~ 2, ARMCD == "ARM C" ~ 3, TRUE ~ as.numeric(NA)), TRTORD = TRTORD %make_label% "Treatment Order", ARM = as.character(arm_mapping[match(ARM, names(arm_mapping))]), ARM = factor(ARM) %>% reorder(TRTORD), ARM = ARM %make_label% "Planned ARM")
adsl_labels <- formatters::var_labels(ADSL, fill = FALSE)
date_vars_adsl <- names(ADSL)[vapply(ADSL, function(x) inherits(x, c("Date", "POSIXct", "POSIXlt")), logical(1))]
char_vars_adsl <- names(Filter(isTRUE, sapply(ADSL, is.character)))
ADSL <- ADSL %>% mutate_at(char_vars_adsl, factor)
formatters::var_labels(ADSL) <- c(adsl_labels)
ADLB_SUBSET <- synthetic_cdisc_data("latest")$adlb %>%
  filter(!is.na(AVAL)) %>%
  filter(PARAMCD %in% c(param_choices) & ITTFL == "Y" & toupper(AVISIT) %like any% c("SCREEN%", "BASE%", "%WEEK%", "%FOLLOW%")) %>%
  select(c("STUDYID", "USUBJID", "ITTFL", "ARM", "ARMCD", "ACTARM", "ACTARMCD", "AVISIT", "AVISITN", "ADY", "PARAM", "PARAMCD", "AVAL", "AVALU", "BASE", "CHG", "PCHG", "LBSTRESC", "SEX", "RACE", "LOQFL")) %>%
  mutate(AVISITCD = case_when(toupper(AVISIT) == "SCREENING" ~ "SCR", toupper(AVISIT) ==
                                "BASELINE" ~ "BL", grepl("WEEK", toupper(AVISIT)) ~ paste("W", trimws(substr(AVISIT, start = 6, stop = str_locate(AVISIT, "DAY") - 1))), grepl("FOLLOW", toupper(AVISIT)) ~ "FU", TRUE ~ as.character(NA)), AVISITCDN = case_when(AVISITCD == "SCR" ~ -2, AVISITCD == "BL" ~ 0, grepl("W", AVISITCD) ~ as.numeric(gsub("[^0-9]+", "", AVISITCD)) * 7, AVISITCD == "FU" ~ 100, TRUE ~ as.numeric(NA)), ANRHI = case_when(PARAMCD == "ALT" ~ 60, PARAMCD == "CRP" ~ 70, PARAMCD == "IGA" ~ 80, TRUE ~ NA_real_), ANRLO = case_when(PARAMCD ==
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   "ALT" ~ 20, PARAMCD == "CRP" ~ 30, PARAMCD == "IGA" ~ 40, TRUE ~ NA_real_), TRTORD = case_when(ARMCD == "ARM A" ~ 1, ARMCD == "ARM B" ~ 2, ARMCD == "ARM C" ~ 3, TRUE ~ as.numeric(NA)), LOQFL = if_else(as.character(LOQFL) == "Y", as.character(LOQFL), "N"), BASE2 = NA, CHG2 = NA, PCHG2 = NA) %>%
  rowwise() %>%
  group_by(PARAMCD) %>%
  mutate(LBSTRESC = ifelse(USUBJID %in% sample(USUBJID, 1, replace = TRUE), paste("<", round(runif(1, min = 25, max = 30))), LBSTRESC)) %>%
  mutate(LBSTRESC = ifelse(USUBJID %in%
                             sample(USUBJID, 1, replace = TRUE), paste(">", round(runif(1, min = 70, max = 75))), LBSTRESC)) %>%
  ungroup()
attr(ADLB_SUBSET[["LBSTRESC"]], "label") <- "Character Result/Finding in Std Format"
attr(ADLB_SUBSET[["ANRLO"]], "label") <- "Analysis Normal Range Lower Limit"
attr(ADLB_SUBSET[["ANRHI"]], "label") <- "Analysis Normal Range Upper Limit"
color_comb <- "#39ff14"
x_tick_num <- c(-2, 0, 7, 14, 21, 28, 35)
x_tick_label <- c("Screening", "Baseline", "Week 1", "Week 2", "Week 3", "Week 4", "Week 5")
PARAM_MINS <- ADLB_SUBSET %>%
  select(USUBJID, PARAMCD, AVAL) %>%
  filter(PARAMCD %in% param_choices) %>%
  group_by(PARAMCD) %>%
  summarise(AVAL_MIN = min(AVAL, na.rm = TRUE), .groups = "drop") %>%
  mutate(PARAMCD = PARAMCD %make_label% "Parameter Code")
ADLB_SUPED1 <- ADLB_SUBSET %>%
  mutate(BASE2 = ifelse(AVISIT == "SCREENING" & is.na(BASE2), AVAL, BASE2) %keep_label% BASE2) %>%
  mutate(CHG2 = ifelse(AVISIT == "SCREENING" & is.na(CHG2), 0, CHG2) %keep_label% CHG2) %>%
  mutate(PCHG2 = ifelse(AVISIT == "SCREENING" & is.na(PCHG2), 0, PCHG2) %keep_label% PCHG2) %>%
  mutate(BASE = ifelse(AVISIT == "BASELINE" & is.na(BASE), AVAL, BASE) %keep_label% BASE) %>%
  mutate(CHG = ifelse(AVISIT == "BASELINE" & is.na(CHG), 0, CHG) %keep_label% CHG) %>%
  mutate(PCHG = ifelse(AVISIT ==
                         "BASELINE" & is.na(PCHG), 0, PCHG) %keep_label% PCHG) %>%
  mutate(TRTORD = TRTORD %make_label% "Treatment Order")
ADLB_SUPED2 <- inner_join(PARAM_MINS, ADLB_SUPED1, by = "PARAMCD")[, union(names(ADLB_SUPED1), names(PARAM_MINS))] %>%
  mutate(AVALL2 = ifelse(PARAMCD %in% exclude_l2, AVAL, ifelse(PARAMCD %in% exclude_chg, NA, ifelse(AVAL == 0 & AVAL_MIN > 0, log2(AVAL_MIN / 2), ifelse(AVAL == 0 & AVAL_MIN <= 0, NA, ifelse(AVAL > 0, log2(AVAL), NA))))) %make_label% "Log2 of AVAL") %>%
  mutate(BASEL2 = ifelse(PARAMCD %in% exclude_l2, BASE, ifelse(PARAMCD %in% exclude_chg, NA, ifelse(BASE == 0 & AVAL_MIN > 0, log2(AVAL_MIN / 2),
                                                                                                    ifelse(BASE == 0 & AVAL_MIN <= 0, NA, ifelse(BASE > 0, log2(BASE), NA))
  ))) %make_label% "Log2 of BASE") %>%
  mutate(BASE2L2 = ifelse(PARAMCD %in% exclude_l2, BASE2, ifelse(PARAMCD %in% exclude_chg, NA, ifelse(BASE2 == 0 & AVAL_MIN > 0, log2(AVAL_MIN / 2), ifelse(BASE2 == 0 & AVAL_MIN <= 0, NA, ifelse(BASE2 > 0, log2(BASE2), NA))))) %make_label% "Log2 of BASE2") %>%
  mutate(AVAL_MIN = AVAL_MIN %make_label% "Minimum AVAL Within PARAMCD")
ADLB <- ADLB_SUPED2 %>% mutate(ARM = as.character(arm_mapping[match(ARM, names(arm_mapping))]), ARM = factor(ARM) %>% reorder(TRTORD) %make_label% "Planned Arm", ACTARM = as.character(arm_mapping[match(ACTARM, names(arm_mapping))]), ACTARM = factor(ACTARM) %>% reorder(TRTORD) %make_label% "Description of Actual Arm", LOQFL = LOQFL %make_label% "Limit of Quantification", AVISITCD = factor(AVISITCD) %>% reorder(AVISITCDN) %make_label% "Analysis Visit Window Code", AVISITCDN = AVISITCDN %make_label%
                                 "Analysis Visit Window Code (N)", BASE2 = BASE2 %make_label% "Screening Value", CHG2 = CHG2 %make_label% "Absolute Change from Screening", PCHG2 = PCHG2 %make_label% "Percent Change from Screening")
ADLB_LOQS <- goshawk:::h_identify_loq_values(ADLB)
ADLB <- left_join(ADLB, ADLB_LOQS, by = "PARAM")

## NOTE: Reproducibility of data import and preprocessing was not
## explicitly checked (argument "check = FALSE" is set).
## The app developer has the choice to check the reproducibility
## and might have omitted this step for some reason. Please reach
## out to the app developer for details.

# ADSL MD5 hash at the time of analysis: fbed0c4bb0fead0913e6b5f59344c394
# ADLB MD5 hash at the time of analysis: dc28bb920572904fc345166c4105dcde

ADSL_FILTERED <- ADSL
ADLB_FILTERED_ALONE <- ADLB
ADLB_FILTERED <- dplyr::inner_join(x = ADLB_FILTERED_ALONE, y = ADSL_FILTERED[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID"))

ANL <- ADLB_FILTERED %>% dplyr::filter(PARAMCD == "ALT")

p <- goshawk::g_boxplot(data = ANL, biomarker = "ALT", xaxis_var = "ARM", yaxis_var = "AVAL", hline_arb = c(10, 30), hline_arb_label = c("A", "B"), hline_arb_color = c("red", "green"), hline_vars = NULL, hline_vars_colors = character(0), hline_vars_labels = character(0), facet_ncol = NA, loq_legend = TRUE, rotate_xlab = TRUE, trt_group = "ARM", ylim = c(5L, 35L), color_manual = c(`Drug X 100mg` = "#1e90ff", Placebo = "#ffa07a", `Combination 100mg` = "#bb9990"), shape_manual = c(N = 1, Y = 2, `NA` = 0), facet_var = "AVISITCD", alpha = 0.8, dot_size = 2L, font_size = 12L, unit = "AVALU")
tbl <- goshawk::t_summarytable(data = ANL, trt_group = "ARM", param_var = "PARAMCD", param = "ALT", xaxis_var = "AVAL", facet_var = "AVISITCD")
print(p)
nikolas-burkoff commented 2 years ago

ok so if you use ggplot2 version 3.3.2 as is on BEE you get the error (reproduced locally) - using 3.3.5 and maybe others you no longer get the error.

So to fix this, we would I believe require an environment update @KlaudiaBB @gogonzo unless there's some workaround with the version of ggplot2 installed on our release on BEE

FYI @npaszty

npaszty commented 2 years ago

@nikolas-burkoff good catch. I think we had some similar ordering issues like this in much earlier in the development of the line attribute enhancements. I too see this behavior but it looked like it can sometimes be sorted by replacing the entire color order otherwise the colors seem to go backwards as soon as they are changed. same across all modules. Seems like the only way to fix is to reset all three horizontal line related UI elements.

looks like 2022_01_28 is okay.

nikolas-burkoff commented 2 years ago

ok so after agreement we won't upgrade today as we have environment freeze but will upgrade straight after this release sprint so moving to top of our backlog

gogonzo commented 2 years ago

@nikolas-burkoff we might consider fixing it in this release. But we need to know if we can quickly apply changes to NEST_releases yaml.

Could you please go to BEE and check if reinstalling ggplot2 causes reinstallation of dependencies? I hope upgrading ggplot2 does not cause deps upgrade.

Quick way to check it.

# mkdir ~/temp_lib
.libPaths(
  c("~/temp_lib", 
     "<latest UAT lib>", 
     .libPaths()
  )
)

# overwrite ggplot2
install.packages("ggplot2") # shoulld install version 3.3.2 or higher

# check what has been installed 
installed.packages("~/temp_lib")

On current libs setup please run the app and examine if problem is solved.

Polkas commented 2 years ago

This is quicker:

> pacs::pac_compare_versions("ggplot2", "3.3.2", "3.3.5")
     Package Version.3.3.2 Version.3.3.5 version_status
1     digest                                          0
2       glue                                          0
3  grDevices                                          0
4       grid                                          0
5     gtable         0.1.1         0.1.1              0
6    isoband                                          0
7       MASS                                          0
8       mgcv                                          0
9          R           3.2           3.3              1
10     rlang         0.3.0        0.4.10              1
11    scales         0.5.0         0.5.0              0
12     stats                                          0
13    tibble                                          0
14     withr         2.0.0         2.0.0              0

The only update is for rlang to 0.4.10

gogonzo commented 2 years ago

what version of rlang we have on BEE? Above doesn't mean that we have rlang 3.0 on BEE

Polkas commented 2 years ago

On BEE we have rlang 0.4.12 so everything looks ok. Could sb confirm that:)

gogonzo commented 2 years ago

I asked also to make sure that overinstalling ggplot2 is enough for the goshawk app. pacs itself can't answer this


Could you make a PR to NEST_releases/nest_dependencies/[latest dependencies yaml for bee and rocker] and upgrade ggplot2 there?

Contact Arkadiusz or Tomasz to run deploy_on_bee pipeline to check.

Polkas commented 2 years ago

updating only the ggplot2 to 3.3.5 is enough for me. I will check how it looks like from the module perspective.

Polkas commented 2 years ago

looks to come from https://github.com/insightsengineering/goshawk/blob/44e370b624527c250d66a970dd6ae07d3131723e/R/geom_axes_line.R#L223

gogonzo commented 2 years ago

@Polkas it comes from the color_scale_manual - where values you pointed are mapped to color and label

Polkas commented 2 years ago

I have a feeling that a dependency update at the freeze day is to late. We could not even test how it impact our example apps. I am trying to find code fix.

gogonzo commented 2 years ago

Yup, I'm convinced. Let's freeze code without this

gogonzo commented 2 years ago

Let's do this after release is done. We will need to upgrade ggplot2 in nest_dependencies (yaml or lock file)

nikolas-burkoff commented 2 years ago

Note when we update ggplot2 to >= 3.3.4 we should update osprey (see for example osprey's patient profile roxygen app):

image

(Though we need to test if the new way of doing things breaks older ggplot2 versions, if so we need to live with this message or understand the consequences of having a hard constraint on version of ggplot2 >= 3.3.4

gogonzo commented 2 years ago

I just checked what has changed in ggplot2 between 3.3.3 and 3.3.5 and it makes sense that in 3.3.3 manual_scale (color_scale_manual) couldn't properly match limits. In the new version limits are taken from the names of the values vector

https://github.com/tidyverse/ggplot2/blob/82c5459b880022dc39b4abb25c924c96e28f7e42/R/scale-manual.r#L136

and I think it's possible to add limits = data$color_var in the color_scale_manual

https://github.com/insightsengineering/goshawk/blob/44e370b624527c250d66a970dd6ae07d3131723e/R/geom_axes_line.R#L227

I guess this would work also on the previous versions of ggplot2