paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 183 forks source link

feature request: Handling Inf in data #1694

Open wpetry opened 1 week ago

wpetry commented 1 week ago

Description of current behavior

When Inf or -Inf are encountered in data, brm passes these rows to Stan, which fails because it is not able to evaluate the lp at the initial values. I think the standard troubleshooting for this error is to specify init = ... and/or to use more informative priors. But Stan will fail with this same error regardless of the initial values or priors specified. This appears to be a fitting issue, when in reality the source of the problem is in the data.

reprex:

library(brms)

x <- 0:100
mu <- 10 + 0.3 * x
y <- rnorm(mu, sd = 2)
dat <- data.frame(x, y)
dat$y[1] <- Inf

mod <- brm(y ~ 1 + x, data = dat)  # fails with Stan initialization error
mod2 <- lm(y ~ 1 + x, data = dat)  # base R regression gives a (somewhat) informative error in the same circumstance

Desired feature behavior

I think the best approach would be to stop the model fitting with an informative error instead of a warning. Infinite values are likely artifacts of errors during the calculation of variables and warrant re-examination before fitting any model (e.g., dividing by 0, log-transforming 0, etc.).

A softer approach would be to drop rows containing infinite values with a warning on the R side, then pass the cleaned data to Stan for fitting. This mirrors the handling of rows containing NA (absent user-specified imputation with mi()). I don't favor this approach because I'm not able to think of cases when it's still reasonable to fit a model after learning that some of the variable values are infinite.

paul-buerkner commented 1 week ago

Thank you for opening this issue! I will address it in the next brms update.

wds15 commented 1 week ago

Isn't brms using Inf to flag special values sometimes? That is a useful thing sometimes, which I am doing myself sometimes. Throwing out a warning is certainly appropriate as Inf values can easily make Stan go crazy.