tm_g_km giving uninformative error at edge case

imazubi commented 3 years ago

Testing with STREAM data on BEE. When faceting by AGEGR1 (<65, >=65) and when filtering AGE in the filter panel to 20<=AGE<65, it throws an error.

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'
In addition: Warning message:
In h_grob_tbl_at_risk(data = data_plot, annot_tbl = annot_tbl, xlim = max(max_time,  :
  NAs introduced by coercion

See the Show R code:

source("https://raw.github.roche.com/NEST/nest_on_bee/master/bee_nest_utils.R")
bee_use_nest(release = "UAT_2021_09_29")

library(reticulate)
library(magrittr)
library(rtables)
library(ggplot2)
library(ggmosaic)
library(shiny)
library(teal)
library(teal.modules.general)
library(dplyr)
library(optimx)
library(tern)
library(teal.modules.clinical)

path <- "/opt/bee/analyses/stream2/master/2.11/stream2/doc/examples/molecule/project/efficacy/task01/outdata_vad/"
adsl_filename <- "adsl.sas7bdat"
adtte_filename <- "adtte.sas7bdat"
adrs_filename <- "adrs.sas7bdat"
adlb_filename <- "adlb.sas7bdat"
ADSL <- haven::read_sas(paste0(path, adsl_filename))
ADTTE <- haven::read_sas(paste0(path, adtte_filename))
ADRS <- haven::read_sas(paste0(path, adrs_filename))
ADLB <- haven::read_sas(paste0(path, adlb_filename))
char_vars_asl <- names(Filter(isTRUE, sapply(ADSL, is.character)))
adrs_labels <- rtables::var_labels(ADRS)
ADRS <- filter(ADRS, PARAMCD == "CBOR" & !(AVISITN == 999))
var_labels(ADRS) <- adrs_labels
adlb_labels <- rtables::var_labels(ADLB)
ADLB <- ADLB %>%
  distinct(STUDYID, USUBJID, PARAMCD, AVISIT, .keep_all = TRUE) %>%
  filter(ABLFL != "Y") %>%
  filter(AVISIT %in% c("CYCLE 1 DAY 1", "CYCLE 1 DAY 8", "CYCLE 1 DAY 15")) %>%
  mutate(AVISIT = as.factor(AVISIT), AVISITN = rank(AVISITN) %>% as.factor() %>% as.numeric() %>% as.factor())
var_labels(ADLB) <- adlb_labels
ADSL <- df_explicit_na(ADSL)

## NOTE: Reproducibility of data import and preprocessing was not
## explicitly checked (argument "check = FALSE" is set).
## The app developer has the choice to check the reproducibility
## and might have omitted this step for some reason. Please reach
## out to the app developer for details.

# ADSL MD5 hash at the time of analysis: 3e14d96fdd54954b9619877f29226cd3
# ADTTE MD5 hash at the time of analysis: 42c4761c191b6be071c146a4ebb0e551

ADSL_FILTERED <- dplyr::filter(ADSL, AGE >= 20 & AGE <= 65)
ADTTE_FILTERED_ALONE <- ADTTE
ADTTE_FILTERED <- dplyr::inner_join(x = ADTTE_FILTERED_ALONE, y = ADSL_FILTERED[, c("STUDYID", "USUBJID"), drop = FALSE], by = c("STUDYID", "USUBJID"))

ANL_1 <- ADTTE_FILTERED %>% dplyr::select(STUDYID, USUBJID, PARAMCD, AVAL, CNSR, AVALU)
ANL_2 <- ADSL_FILTERED %>% dplyr::select(STUDYID, USUBJID, ARM, RACE, AGEGR1)
ANL_3 <- ADTTE_FILTERED %>%
  dplyr::filter(PARAMCD == "OS") %>%
  dplyr::select(STUDYID, USUBJID, PARAMCD)
ANL <- ANL_1
ANL <- dplyr::inner_join(ANL, ANL_2, by = c("STUDYID", "USUBJID"))
ANL <- dplyr::inner_join(ANL, ANL_3, by = c("STUDYID", "USUBJID", "PARAMCD"))
ANL <- ANL %>% rtables::var_relabel(AVAL = "Analysis Value", CNSR = "Censor", ARM = "Description of Planned Arm", PARAMCD = "Parameter Code", RACE = "Race", AGEGR1 = "Pooled Age Group 1", AVALU = "Analysis Value Unit")

anl <- ANL %>%
  filter(ARM %in% c("ARM A", "ARM B", "ARM C")) %>%
  mutate(ARM = relevel(ARM, ref = "ARM A")) %>%
  mutate(ARM = droplevels(ARM)) %>%
  mutate(is_event = CNSR == 0)
variables <- list(tte = "AVAL", is_event = "is_event", arm = "ARM", strat = "RACE")
grid::grid.newpage()
lyt <- grid::grid.layout(nrow = nlevels(ANL$AGEGR1), ncol = 1) %>%
  grid::viewport(layout = .) %>%
  grid::pushViewport()
result <- mapply(df = split(anl, f = anl$AGEGR1), nrow = seq_along(levels(anl$AGEGR1)), FUN = function(df_i, nrow_i) {
  if (nrow(df_i) == 0) {
    grid::grid.text("No data found for a given facet value.", x = 0.5, y = 0.5, vp = grid::viewport(layout.pos.row = nrow_i, layout.pos.col = 1))
  } else {
    g_km(df = df_i, variables = variables, font_size = 8L, xlab = paste0("Time", " (", gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", tolower(anl$AVALU[1]), perl = TRUE), ")"), yval = "Survival", xticks = NULL, newpage = FALSE, title = paste("KM Plot of OS", ",", quote(AGEGR1), "=", as.character(unique(df_i$AGEGR1))), ggtheme = theme_minimal(), annot_surv_med = TRUE, annot_coxph = TRUE, control_surv = control_surv_timepoint(conf_level = 0.95), control_coxph_pw = control_coxph(
      conf_level = 0.95,
      pval_method = "log-rank", ties = "exact"
    ), ci_ribbon = FALSE, vp = grid::viewport(layout.pos.row = nrow_i, layout.pos.col = 1), draw = TRUE)
  }
}, SIMPLIFY = FALSE)
km_grobs <- tern::stack_grobs(grobs = result)
km_grobs

Try this out:

subjects_65 <- ADSL_FILTERED$USUBJID[ADSL_FILTERED$AGE == 65]
anl_65 <- anl %>% filter(USUBJID %in% subjects_65)

Theoretically, there are 25 subjects with AGE == 65 and have events and censored values.

imazubi commented 3 years ago

App code:

# Next two lines are for using NEST packages on BEE (r.roche.com)
source("https://raw.github.roche.com/NEST/nest_on_bee/master/bee_nest_utils.R")
bee_use_nest(release = "UAT_2021_09_29")

library(rtables)
library(teal.modules.general)
library(teal.modules.clinical)

## Log app usage
log_app_usage(ta = "Oncology", molecule = "Tecentriq", ind = "NSCLC", anl_type = "Exploratory")

# code>
## Generate Data
path <- "/opt/bee/analyses/stream2/master/2.11/stream2/doc/examples/molecule/project/efficacy/task01/outdata_vad/"
adsl_filename <- "adsl.sas7bdat"
adtte_filename <- "adtte.sas7bdat"
adrs_filename <- "adrs.sas7bdat"
adlb_filename <- "adlb.sas7bdat"

ADSL <- haven::read_sas(paste0(path, adsl_filename)) 
ADTTE <- haven::read_sas(paste0(path, adtte_filename)) 
ADRS <- haven::read_sas(paste0(path, adrs_filename)) 
ADLB <- haven::read_sas(paste0(path, adlb_filename)) 

adrs_labels <- rtables::var_labels(ADRS)
ADRS <- filter(ADRS, PARAMCD == "CBOR" & !(AVISITN == 999))
var_labels(ADRS) <- adrs_labels

adlb_labels <- rtables::var_labels(ADLB)
ADLB <- ADLB %>%
  distinct(STUDYID, USUBJID, PARAMCD, AVISIT, .keep_all = TRUE) %>% #if not mmrm gives an error saying there are duplicates for each visit
  filter(ABLFL != "Y") %>%
  filter(AVISIT %in% c("CYCLE 1 DAY 1", "CYCLE 1 DAY 8", "CYCLE 1 DAY 15")) %>%
  mutate(
    AVISIT = as.factor(AVISIT),
    AVISITN = rank(AVISITN) %>%
      as.factor() %>%
      as.numeric() %>%
      as.factor() 
  )
var_labels(ADLB) <- adlb_labels

ADSL <- df_explicit_na(ADSL)

# <code

## Reusable Configuration For Modules
arm_vars <- c("ARMCD", "ARM")
strata_vars <- c("RACE", "AGEGR1")
facet_vars <- c("AGEGR1", "SEX", "COUNTRY")
cov_vars <- c("SEX")
visit_vars <- c("AVISIT", "AVISITN")

cs_arm_var <- choices_selected(
  choices = variable_choices(ADSL, subset = arm_vars),
  selected = "ARM"
)

cs_strata_var <- choices_selected(
  choices = variable_choices(ADSL, subset = strata_vars),
  selected = "RACE"
)

cs_facet_var <- choices_selected(
  choices = variable_choices(ADSL, subset = facet_vars),
  selected = "AGEGR1"
)

cs_cov_var <- choices_selected(
  choices = variable_choices(ADSL, subset = cov_vars),
  selected = "SEX"
)

cs_paramcd_tte <- choices_selected(
  choices = value_choices(ADTTE, "PARAMCD", "PARAM"),
  selected = "OS"
)

cs_paramcd_rsp <- choices_selected(
  choices = value_choices(ADRS, "PARAMCD", "PARAM"),
  selected = "CBOR"
)

cs_paramcd_adlb <- choices_selected(
  choices = value_choices(ADLB, "PARAMCD", "PARAM"),
  selected = "ALBUMCV"
)

cs_visit_var_adlb <- choices_selected(
  choices = variable_choices(ADLB, subset = visit_vars),
  selected = "AVISIT"
)

# cs_formula_mmrm <- choices_selected(
#   choices = c('BASE + AVISITN + ARMCD + ARMCD*AVISITN + SEX',
#               'BASE + AVISIT + ARMCD + ARMCD*AVISIT + SEX',
#               'BASE + AVISIT + ARM + ARM*AVISIT + SEX'),
#   selected = 'BASE + AVISIT + ARM + ARM*AVISIT + SEX')

# fact_vars_asl <- names(Filter(isTRUE, sapply(ADSL, is.factor)))
# fact_vars_asl_orig <- fact_vars_asl[!fact_vars_asl %in% char_vars_asl]

fact_vars_asl <- names(Filter(isTRUE, sapply(ADSL, is.factor)))
char_vars_asl <- names(Filter(isTRUE, sapply(ADSL, is.character)))

date_vars_asl <- names(ADSL)[vapply(ADSL, function(x) inherits(x, c("Date", "POSIXct", "POSIXlt")), logical(1))]
demog_vars_asl <- names(ADSL)[!(names(ADSL) %in% c("USUBJID", "STUDYID", date_vars_asl))]

# reference & comparison arm selection when switching the arm variable
# ARMCD is given in a delayed fashion using value choices and
# ARM is given with the ref and comp levels supplied explicitly
arm_ref_comp <- list(
  ARMCD = list(ref = value_choices("ADSL", var_choices = "ARMCD", var_label = "ARM", subset = "A"), 
               comp = value_choices("ADSL", var_choices = "ARMCD", var_label = "ARM", subset = c("B", "C"))),
  ARM = list(ref = "ARM A", comp = c("ARM B", "ARM C"))
)

## Setup App
app <- init(
  data = cdisc_data(
    cdisc_dataset(dataname = "ADSL", x = ADSL, vars = list(char_vars_asl = char_vars_asl)),
    cdisc_dataset(dataname = "ADRS", x = ADRS, keys = c(get_cdisc_keys("ADRS"), "RSSEQ")),
    cdisc_dataset("ADTTE", ADTTE),
    cdisc_dataset(dataname = "ADLB", x = ADLB, keys = c(get_cdisc_keys("ADLB"), "LBSEQ")), 
    code = get_code("app.R",exclude_comments = TRUE, read_sources = TRUE), 
    check = TRUE
  ),
  modules = root_modules(
    module(
      label = "Study Information",
      server = function(input, output, session, datasets) {},
      ui = function(id, ...) {
        tagList(
          tags$p("Info about data source:"),
          tags$p(
            "Random data are used that has been created with the ",
            tags$code("random.cdisc.data"), "R package."
          )
        )
      },
      filters = "all"
    ),
    tm_data_table("Data Table"),
    tm_variable_browser("Variable Browser"),
    tm_t_summary(
      label = "Demographic Table",
      dataname = "ADSL",
      arm_var = cs_arm_var,
      summarize_vars = choices_selected(
        choices = variable_choices(ADSL, demog_vars_asl),
        selected = c("SEX", "AGE", "RACE")
      )
    ),
    modules(
      "Forest Plots",
      tm_g_forest_tte(
        label = "Survival Forest Plot",
        dataname = "ADTTE",
        arm_var = cs_arm_var,
        strata_var = cs_strata_var,
        subgroup_var = cs_facet_var,
        paramcd = cs_paramcd_tte,
        plot_height = c(800L, 200L, 4000L)
      ),
      tm_g_forest_rsp(
        label = "Response Forest Plot",
        dataname = "ADRS",
        arm_var = cs_arm_var,
        strata_var = cs_strata_var,
        subgroup_var = cs_facet_var,
        paramcd = cs_paramcd_rsp,
        plot_height = c(800L, 200L, 4000L)
      )
    ),
    tm_g_km(
      label = "Kaplan Meier Plot",
      dataname = "ADTTE",
      arm_var = cs_arm_var,
      arm_ref_comp = arm_ref_comp,
      paramcd = cs_paramcd_tte,
      facet_var = cs_facet_var,
      strata_var = cs_strata_var,
      plot_height = c(1800L, 200L, 4000L)
    ),
    tm_t_rsp(
      label = "Response Table",
      dataname = "ADRS",
      arm_var = cs_arm_var,
      arm_ref_comp = arm_ref_comp,
      paramcd = cs_paramcd_rsp,
      strata_var = cs_strata_var
    ),
    tm_t_tte(
      label = "Time To Event Table",
      dataname = "ADTTE",
      arm_var = cs_arm_var,
      paramcd = cs_paramcd_tte,
      strata_var = cs_strata_var,
      time_points = choices_selected(c(182, 365, 547), 182),
      event_desc_var = choices_selected(
        choices = variable_choices("ADTTE", "EVNTDESC"), 
        selected = "EVNTDESC", 
        fixed = TRUE
      )
    ),
    tm_t_crosstable(
      "Cross Table",
      x = data_extract_spec(
        dataname = "ADSL",
        select = select_spec(
          choices = variable_choices(ADSL, fact_vars_asl),
          selected = fact_vars_asl[fact_vars_asl == "RACE"]
        )
      ),
      y = data_extract_spec(
        dataname = "ADSL",
        select = select_spec(
          choices = variable_choices(ADSL, fact_vars_asl),
          selected = fact_vars_asl[fact_vars_asl == "ETHNIC"]
        )
      )
    ),
    tm_t_coxreg(
      label = "Cox Reg",
      dataname = "ADTTE",
      arm_var = cs_arm_var,
      arm_ref_comp = arm_ref_comp,
      paramcd = cs_paramcd_tte,
      strata_var = cs_strata_var,
      cov_var = cs_cov_var
    ),
    tm_t_logistic(
      label = "Logistic Reg",
      dataname = "ADRS",
      arm_var = cs_arm_var,
      arm_ref_comp = NULL,
      paramcd = cs_paramcd_rsp,
      cov_var = cs_cov_var
    ),
    tm_a_mmrm(
      label = "MMRM",
      dataname = "ADLB",
      aval_var = choices_selected(c("AVAL", "CHG"), "AVAL"),
      id_var = choices_selected(c("USUBJID", "SUBJID"), "USUBJID"),
      arm_var = cs_arm_var,
      visit_var = cs_visit_var_adlb,
      arm_ref_comp = arm_ref_comp,
      paramcd = cs_paramcd_adlb,
      cov_var = choices_selected(c("BASE", "SEX", "BASE*AVISIT", "ARM:SEX"), NULL),
      conf_level = choices_selected(c(0.95, 0.9, 0.8), 0.95)
    ),
    tm_t_binary_outcome(
      label = "Binary Response",
      dataname = "ADRS",
      arm_var = cs_arm_var,
      paramcd = cs_paramcd_rsp,
      strata_var = cs_strata_var
    ),
    tm_t_ancova(
      label = "ANCOVA",
      dataname = "ADLB",
      avisit = choices_selected(value_choices(ADLB, "AVISIT"), "CYCLE 1 DAY 1"),
      arm_var = cs_arm_var,
      arm_ref_comp = arm_ref_comp,
      aval_var = choices_selected(variable_choices(ADLB, c("AVAL", "CHG", "PCHG")), "CHG"),
      cov_var = choices_selected(variable_choices(ADLB, c("BASE", "SEX")), "BASE"),
      paramcd = cs_paramcd_adlb
    )
  ),
  header = div(
    class = "",
    style = "margin-bottom: 2px;",
    tags$h1("Example App with teal.modules.clinical modules", tags$span("SPA", class = "pull-right"))
  ),
  footer = tags$p(class = "text-muted", "Source: agile-R website")
)

shinyApp(app$ui, app$server)

imazubi commented 3 years ago

I have been doing some debugging on this.

Just taking anl_65 <- anl %>% filter(AGEGR1 == ">=65") as it is this facet part which gives an error:

What is happening here is that all records from anl_65 belong to ARM A.

> unique(anl_65$ARM)
[1] ARM A
Levels: ARM A ARM B ARM C

Hence, h_tbl_coxph_pairwise is returning NULL as there are no ARMS to go for pairwise comparisons in s_coxph_pairwise within h_tbl_coxph_pairwise.

This is an output that h_tbl_coxph_pairwise should return in non-edge conditions.

Finally, h_grob_coxph, which is a wrapper of h_tbl_coxph_pairwise , gives the final error:

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

Concluding, this is a tern related error. In case we replace df[[arm]] <- factor(df[[arm]]) by df[[arm]] <- as.factor(df[[arm]]) (line 1154 from kaplan_meier_plot.R) we would keep the ARM levels and we would be somehow able to provide an empty dataframe to s_coxph_pairwise analysis function (and return NAs if nrow(df) == 0?) Should we try to somehow catch this error in tern (not an easy task as there are many downstream dependencies) and just show KM plot (with one curve) with NA-s in COX related tables, or just modify the template from tm_g_km where an informative message appears (and not the plot) when levels(df_i$ARM) <= 1 ?

What do you think @anajens ?

anajens commented 3 years ago

Thanks so much for details @imazubi! Based on your example I had a quick look and spotted a few other things in g_km that we need to improve. For example, even a single level KM curve with STUDYID as "arm" has a warning and no numbers for patients at risk:

So for now I'm going to block this and open up the downstream issues in tern.

imazubi commented 3 years ago

@anajens this issue would be solved with https://github.com/insightsengineering/tern/pull/204

I created new variable AGEGR1 in our scda ADSL so in the following way:

ADSL <- ADSL %>% mutate(AGEGR1 = factor(ifelse(AGE<45 & ARMCD == "ARM A", "<45", ">=45")))

So basically, AGEGR1 = "<45" only for ARM A, which was the error scenario (no comparator level to go through pairwise cox model). As you see in the screenshot, the error does not exist anymore but it displays the Km curve without the coxph table.

If it is okay for you, we can close it or we can test it in tomorrow's UAT version with real data.

anajens commented 3 years ago

Let's leave this open as a reminder to check with Friday's deployed apps.

imazubi commented 3 years ago

Verified and OK

insightsengineering / teal.modules.clinical

tm_g_km giving uninformative error at edge case #139