Errors when right-censoring: bug in normal_lccdf?

simoncolumbus commented 1 year ago

I've been running into issues fitting models with a right-censored normal distribution, using brms. I've submitted an issue with brms here: https://github.com/paul-buerkner/brms/issues/1423 However, it appears that the error does not arise from brms. @paul-buerkner suggested it might instead arise from a bug in normal_lccdf, so I thought I'd post it here.

I appears that the issue occurs only

using the gaussian family; others such as student or weibull work fine;
when right-censoring, whereas left-censoring works fine (suggesting normal_lccdf as the culprit);
quite recently (as others report running such models a few months back without issue).

Below is my original post to the brms issue tracker, containing a MWE. Apologies for not testing this with other Stan interfaces.

# Censoring example
set.seed(25)
n <- 5000
t1 <- 106 # Right censoring cutoff
t2 <- 94 # Left censoring cutoff

d <- tibble(y = rnorm(n, mean = 100, sd = 15),
            x = rnorm(n, 0, 1)) %>% 
  # Right censored variables
  mutate(y1   = if_else(y > t1, t1, y),
         cen1 = if_else(y > t1, "right", "none")) %>%
  # Left censored variables
  mutate(y2 = if_else(y < t2, t2, y),
         cen2 = if_else(y < t2, "left", "none"))

# Right censoring
fit <- brm(bf(y1 | cens(cen1) ~ x),
    data = d)

Compiling Stan program...
Start sampling
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
...
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y1 | cens(cen1) ~ x 
   Data: d (Number of observations: 5000)

This occurs across multiple datasets and different kinds of models (linear mixed model with one random factor; mixed-effects location scale model), and when init = 0.

Stan code for this model (generated with brms 2.17.0):

functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=-1,upper=2> cens[N];  // indicates censoring
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept | 3, 99.9, 9);
  lprior += student_t_lpdf(sigma | 3, 0, 9)
    - 1 * student_t_lccdf(0 | 3, 0, 9);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + Xc * b;
    for (n in 1:N) {
    // special treatment of censored data
      if (cens[n] == 0) {
        target += normal_lpdf(Y[n] | mu[n], sigma);
      } else if (cens[n] == 1) {
        target += normal_lccdf(Y[n] | mu[n], sigma);
      } else if (cens[n] == -1) {
        target += normal_lcdf(Y[n] | mu[n], sigma);
      }
    }
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

In contrast, left-censoring works:

# Left censoring
brm(bf(y2 | cens(cen2) ~ x),
    data = d)

I tested this on two machines.

# Machine 1
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] brms_2.17.0     Rcpp_1.0.9      forcats_0.5.1   stringr_1.4.0   dplyr_1.0.6     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3     tibble_3.1.8   
[10] ggplot2_3.3.6   tidyverse_1.3.1

# Machine 2
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] brms_2.17.0     Rcpp_1.0.9      forcats_0.5.2   stringr_1.4.1   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2     tidyr_1.2.0     tibble_3.1.8   
[10] ggplot2_3.3.6   tidyverse_1.3.2

paul-buerkner commented 1 year ago

Not sure who would be the best person to tag here but I think this issue deserves some attention given that normal_lccdf is quite a relevant function. If it turns out to indeed be a bug in that function, we should aim to fix it sooner rather than later. @wds15 would you mind checking or do you have a suggestion for who to tag here?

nhuurre commented 1 year ago

This is probably a duplicate of #1985 .

paul-buerkner commented 1 year ago

Yes, indeed, thank you!

wds15 commented 1 year ago

@paul-buerkner I think @nhuurre is on the spot. The cdf & ccdf's are not great in their precision and looking through the history of things here it's unfortunate that this has not yet been resolved. For brms one could try to solve the matter by resorting to normal_lcdf and sign-flipping the mean possibly? This obviously needs to be solve in Stan-math eventually...but rstan is still so much behind such that most R users would love the workaround...

paul-buerkner commented 1 year ago

I agree. Not sure, how a workaround would look like in brms though, given brms strict naming conventions on functions and our inability to locally overwrite built-in with custom functions (I think?)

wds15 commented 1 year ago

Not sure either. You can't overload a built in function, no. You'd need to rename this. Maybe normalcens_lccdf will be the distribution and then you can create a user function handling this.

stan-dev / math

Errors when right-censoring: bug in normal_lccdf? #2847