Dichotomous data? - Githubissues

wlandau commented 7 months ago

Should we extend brms.mmrm to support dichotomous data? I think brms straightforwardly accepts nonlinear link functions, but I am wondering how this would affect #53 and #40.

wlandau commented 7 months ago

For the time being, I restricted brm_model() to continuous families with identity links: a9cbfaa79f6a5c9f9497e9b8436c81fa1c8c24b8.

wlandau commented 7 months ago

There are two main challenges for dichotomous endpoints:

Estimated marginal means: #53
Conditional means priors: #40

For (2), the worst-case scenario is that non-continuous non-identity-link conditional means priors are not possible in brms, in which case we can disable conditional means priors for the non-continuous non-identity-link case. For (1), emmeans just reports marginal means on the scale of the link and makes no attempt to transform them to the scale of the response (see details below). In the case of our Bayesian model, all we have to do is invert the link function (apply the mean function) on the link-scale marginal means we already calculate. We might have to add custom support for each one, but this does not seem too hard.

``` r suppressPackageStartupMessages({ library(brms) library(coda) library(emmeans) library(mmrm) library(posterior) library(tidyverse) library(zoo) }) emm_options(sep = "|") # FEV data from the mmrm package, using LOCF and then LOCF reversed # to impute responses. data(fev_data, package = "mmrm") data <- fev_data %>% mutate(FEV1_CHG = FEV1 - FEV1_BL, USUBJID = as.character(USUBJID)) %>% select(-FEV1) %>% group_by(USUBJID) %>% complete( AVISIT, fill = as.list(.[1L, c("ARMCD", "FEV1_BL", "RACE", "SEX", "WEIGHT")]) ) %>% ungroup() %>% arrange(USUBJID, AVISIT) %>% group_by(USUBJID) %>% mutate(FEV1_CHG = na.locf(FEV1_CHG, na.rm = FALSE)) %>% mutate(FEV1_CHG = na.locf(FEV1_CHG, na.rm = FALSE, fromLast = TRUE)) %>% ungroup() %>% filter(!is.na(FEV1_CHG)) %>% mutate(FEV1_CHG = as.integer(FEV1_CHG > mean(FEV1_CHG))) # dichotomous data summary_data <- data %>% group_by(ARMCD, AVISIT) %>% summarize( source = "1_data", mean = mean(FEV1_CHG), lower = mean(FEV1_CHG) - qnorm(0.975) * sd(FEV1_CHG) / sqrt(n()), upper = mean(FEV1_CHG) + qnorm(0.975) * sd(FEV1_CHG) / sqrt(n()), .groups = "drop" ) # lm + emmeans model <- glm( FEV1_CHG ~ FEV1_BL + FEV1_BL:AVISIT + ARMCD + ARMCD:AVISIT + AVISIT + RACE + SEX + WEIGHT, data = data, family = binomial(link = "logit") ) summary_emmeans <- emmeans( object = model, specs = ~ARMCD:AVISIT, wt.nuis = "proportional", nuisance = c("USUBJID", "RACE", "SEX") ) %>% as.data.frame() %>% as_tibble() %>% select(ARMCD, AVISIT, emmean) %>% rename(emmeans = emmean) # custom marginal means from lm draws using a custom mapping # from model parameters to marginal means. proportional_factors <- model.matrix( object = FEV1_CHG ~ 0 + SEX + RACE, data = data ) %>% colMeans() %>% t() grid <- data %>% mutate(FEV1_BL = mean(FEV1_BL), FEV1_CHG = 0, WEIGHT = mean(WEIGHT)) %>% distinct(ARMCD, AVISIT, FEV1_BL, WEIGHT, FEV1_CHG) parameters <- model %>% coef() %>% t() %>% as_tibble() mapping <- model.matrix( object = FEV1_CHG ~ FEV1_BL + FEV1_BL:AVISIT + ARMCD + ARMCD:AVISIT + AVISIT + WEIGHT, data = grid ) %>% bind_cols(proportional_factors) stopifnot(all(colnames(parameters) %in% colnames(mapping))) mapping <- as.matrix(mapping)[, colnames(parameters)] rownames(mapping) <- paste(grid$ARMCD, grid$AVISIT, sep = "|") custom <- as.matrix(parameters) %*% t(mapping) %>% as.data.frame() %>% as_tibble() summary_custom <- custom %>% pivot_longer(everything()) %>% separate("name", c("ARMCD", "AVISIT")) %>% group_by(ARMCD, AVISIT) %>% summarize(custom = mean(value), .groups = "drop") # Compare results summary <- summary_custom %>% left_join(y = summary_emmeans, by = c("ARMCD", "AVISIT")) %>% mutate(difference = custom - emmeans) print(summary) #> # A tibble: 8 × 5 #> ARMCD AVISIT custom emmeans difference #> #> 1 PBO VIS1 -1.40 -1.40 0 #> 2 PBO VIS2 -1.43 -1.43 0 #> 3 PBO VIS3 -0.0198 -0.0198 -1.39e-17 #> 4 PBO VIS4 0.971 0.971 2.22e-16 #> 5 TRT VIS1 -0.568 -0.568 0 #> 6 TRT VIS2 0.131 0.131 1.11e-16 #> 7 TRT VIS3 1.08 1.08 0 #> 8 TRT VIS4 1.73 1.73 2.22e-16 ``` ^{Created on 2024-04-15 with [reprex v2.1.0](https://reprex.tidyverse.org)}

chstock commented 7 months ago

We may want to focus on "unconditional treatment effects" in the sense of the recently updated FDA guideline on Adjusting for Covariates in Randomized Clinical Trials (see section III C. on "Nonlinear Models").

wlandau commented 7 months ago

If I understand correctly, I think we are mostly using unconditional treatment effects, i.e. regress on RACE as a covariate but report non-RACE-specific marginal means and treatment effects. We only calculate conditional treatment effects when formally declaring a factor as a subgroup. Seems like this would carry over to from the continuous outcome case to the dichotomous outcome scenario just fine. Please let me know if I am missing something.

wlandau commented 7 months ago

I wonder, if we allow a non-identity link and dichotomous data, would we have to change the package name to something like brms.glmm?

wlandau commented 6 months ago

And would it make sense anymore to regress on baseline?

wlandau commented 6 months ago

If we use a link function for dichotomous data, we need a distributional family for dichotomous data, such as bernoulli. If we use a different family, the parameters change, and there could be any number of distributional parameters instead of the current sigma. This not only affects the "effect size" calculation in brm_marginal_draws() (we would just need to exclude it from the output), but also the way any non-sigma distributional parameters are specified in brm_formula(). And of course the Stan code is very different because the model is completely different. From make_stancode(), I see brms even models the unscaled residuals as actual parameters for the Bernoulli and binomial cases. Supporting any one family is a lot to ask, and supporting every family in brms is not feasible.

I am favor of excluding non-Gaussian families from the package, or at least waiting until we know exactly which family or families to bring within scope.

Transferring this issue thread to a discussion thread.

openpharma / brms.mmrm

Dichotomous data? #87