hrue / r-inla

This is the public repository for the r-inla project
GNU General Public License v2.0
88 stars 24 forks source link

Poisson INLA model fails for large datasets #73

Closed jim-rafferty closed 1 year ago

jim-rafferty commented 1 year ago

Hi,

A colleague and I are trying to build an INLA model with a dataset of ~3 million samples and 4 features. INLA fails with the error:

Error in inla.inlaprogram.has.crashed() : 
  The inla-program exited with an error. Unless you interupted it yourself, please rerun with verbose=TRUE and check the output carefully.
  If this does not help, please contact the developers at <help@r-inla.org>.
Error in inla.core.safe(formula = formula, family = family, contrasts = contrasts,  : 
  *** Fail to get good enough initial values. Maybe it is due to something else. 

After a bit of testing, we have found that a dataset of approx 100k samples will train okay, but 200k will not. I initially thought the issue was memory related, but it's not as the error persists over multiple machines with different amounts of memory up to 128Gb, and the upper limit of samples is always the same. I tried running the model with verbose=TRUE as suggested, but there are no errors shown in the resulting output. Curiously, if we generate a model with no predictors (ie run inla(y ~ 1, family = "poisson", ...) there are no errors and everything works fine.

Is there some internal limit that we are running into? Thanks in advance.

R version: 4.2.2 INLA version: 22.4.16 sp_1.5-1

Minimal code to reproduce the error (mostly lifted from the tutorial):

library(INLA)

n = 200000
x = runif(n)
eta = 1 + x
lambda = exp(eta)
y = rpois(n, lambda = lambda)

r = inla(y ~ 1 + x,  family = "poisson",        
         data = data.frame(y, x),          
         control.predictor = list(link = 1), verbose = TRUE)
hrue commented 1 year ago

upgrade to a recent testing version...

On Tue, 2023-03-21 at 03:27 -0700, jim-rafferty wrote:

library(INLA)

n = 200000 x = runif(n) eta = 1 + x lambda = exp(eta) y = rpois(n, lambda = lambda)

r = inla(y ~ 1 + x, family = "poisson",
         data = data.frame(y, x),
         control.predictor = list(link = 1), verbose = TRUE)

-- Håvard Rue Professor of Statistics Chair of the Statistics Program CEMSE Division King Abdullah University of Science and Technology Thuwal 23955-6900 Kingdom of Saudi Arabia

@.*** Office: +966 (0)12 808 0640   Mobile: +966 (0)54 470 0421 Research group: bayescomp.kaust.edu.sa   R-INLA project: www.r-inla.org Zoom: kaust.zoom.us/my/haavard.rue

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

jim-rafferty commented 1 year ago

Fab, this seems to have done the trick. Thanks. FYI the latest version for R 4.2 seems to install and work okay on R 4.1 (I am working in a TRE so I can't easily upgrade the R version).

jim-rafferty commented 1 year ago

Hi again. I spoke too soon (kinda). We are having the same problem with a negative binomial model. Do you have any suggestions?

hrue commented 1 year ago

that must be related to something else. please provide an example

On Tue, 2023-03-21 at 10:14 -0700, jim-rafferty wrote:

Hi again. I spoke too soon (kinda). We are having the same problem with a negative binomial model. Do you have any suggestions? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

-- Håvard Rue Professor of Statistics Chair of the Statistics Program CEMSE Division King Abdullah University of Science and Technology Thuwal 23955-6900 Kingdom of Saudi Arabia

@.*** Office: +966 (0)12 808 0640   Mobile: +966 (0)54 470 0421 Research group: bayescomp.kaust.edu.sa   R-INLA project: www.r-inla.org Zoom: kaust.zoom.us/my/haavard.rue

--

This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

jim-rafferty commented 1 year ago

Thanks again for your help. On further investigation I have found trying to build the model like this

r = inla(formula,
         family = "nbinomial",…

leads to the same issue as the original post, ie, the model fails if the sample size is over about 100k samples. If we build the model like this:

r = inla(formula,
         family = "binomial",
         control.family = list(variant = 1)…

Everything seems to work okay. We are testing the ZIP and ZINB models now.