inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
78 stars 21 forks source link

Error: "Error in B[[label]] %*% state[[label]]" with initial values and issue with keep option #192

Closed osupplisson closed 1 year ago

osupplisson commented 1 year ago

Hi,

Following an answer from Finn regarding how to fit a non-linear model using Inla (see here on R-inla discussion for context), I gave it a try with inlabru, which is an absolutely wonderful and intuitive package: thank you very much once again !

In my attempt to fit the model and ease convergence, I wanted to build the model step by step starting from n-1 components, then adding non-linearity, then adding another components, etc. In this process, I went across the following error:

Error in B[[label]] %*% state[[label]] : 
méthode pas encore implémentée pour <dgCMatrix> %*% <NULL>

(meaning method not yet implemented for <dgCMatrix> %*% <NULL>)

The error seems to be triggered by the bru_initial argument in the bru_options_set() function when you try to use as starting values a model with n-1 components for a model with n components.

Maybe I misunderstood how to use the option and past starting values should also be pass in the bru() call in inla's control.mode option ? Do you have some workaround for using a model with n-1 components as starting values for a model with n components? As a fallback solution, I am currently manually specifying the 'initial' argument in the components using past-run mean in the internal scale (past_fit$internal.summary.hyperpar) but I don't think it's equivalent to bru_options_set()?

I would also like to use this post to ask another unrelated small question: is there a way to not store inla results in the directory ? I tried with INLA 'keep=FALSE' option and does not seem to work (bru_options_set( keep = FALSE) and also in the bru()call). Some model need a lot of iteration to converge and my HPC partition is running out of memory because of all the files saved...

Best, Olivier

Here is a small reproducible example:

library(tidyverse)
library(inlabru)

n <- 10
x <- rnorm(10,0,1)
y <- runif(10,-1,1)
id_1 <-  c(rep(1,n/3),rep(2,n/3),rep(2,n/3))
id_2 <- c(rep(1,n/2),rep(2,n/2))

data <-cbind(x,
            y,
            id_1,
            id_2)

bru_options_set(bru_initial = NULL)

# Model with 1 component-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
likelihood1 <- like(
  family = "gaussian",
  data = data,
  formula = y ~ Intercept + id1
)
fit1 <- bru(
  y ~ Intercept(1) + id1(id_1,model = 'iid') ,
  likelihood1
)

##Working
bru_options_set(bru_initial = fit1)
fit1_bis <- bru(
  y ~ Intercept(1) + id1(id_1,model = 'iid') ,
  likelihood1
)

# Exanped model to 2 components-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bru_options_set(bru_initial = fit1)

##Not working
likelihood2 <- like(
  family = "gaussian",
  data = data,
  formula = y ~ Intercept + id1 + id2
)
fit2 <- bru(
  y ~ Intercept(1) + id1(id_1,model = 'iid') + id2(id_2,model = 'iid'),
  likelihood2
)

##Working
bru_options_set(bru_initial = NULL)
fit2 <- bru(
  y ~ Intercept(1) + id1(id_1,model = 'iid') + id2(id_2,model = 'iid'),
  likelihood2
)

> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22000)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] inlabru_2.7.0.9010 sp_1.6-0           lubridate_1.9.2    forcats_1.0.0      stringr_1.5.0      dplyr_1.1.2       
 [7] purrr_1.0.1        readr_2.1.4        tidyr_1.3.0        tibble_3.2.1       ggplot2_3.4.2      tidyverse_2.0.0   

loaded via a namespace (and not attached):
 [1] s2_1.1.4           utf8_1.2.3         generics_0.1.3     class_7.3-21       KernSmooth_2.23-20 stringi_1.7.12    
 [7] lattice_0.21-8     hms_1.1.3          magrittr_2.0.3     timechange_0.2.0   grid_4.3.0         plyr_1.8.8        
[13] Matrix_1.5-4       e1071_1.7-13       DBI_1.1.3          fansi_1.0.4        tidytable_0.10.1   scales_1.2.1      
[19] cli_3.6.1          rlang_1.1.1        units_0.8-2        INLA_23.05.30-1    splines_4.3.0      munsell_0.5.0     
[25] withr_2.5.0        parallel_4.3.0     tools_4.3.0        MatrixModels_0.5-1 tzdb_0.4.0         deldir_1.0-9      
[31] colorspace_2.1-0   boot_1.3-28.1      vctrs_0.6.2        R6_2.5.1           proxy_0.4-27       lifecycle_1.0.3   
[37] classInt_0.4-9     spdep_1.2-8        pkgconfig_2.0.3    pillar_1.9.0       gtable_0.3.3       data.table_1.14.8 
[43] glue_1.6.2         Rcpp_1.0.10        sf_1.0-12          tidyselect_1.2.0   rstudioapi_0.14    wk_0.7.3          
[49] compiler_4.3.0     spData_2.2.2  
finnlindgren commented 1 year ago

For setting initial values, I would strongly recommend using bru(..., options = list(bru_initial = ...)) instead of setting a global option, as it should only be applied to a specific bru() call and not to all bru() calls.

But the main problem is that you have to set initial values that are compatible with the model; when you add a component, the previous result object isn't compatible. Instead you need to set initial values for the specific components that you have a previous estimate for, like bru_initial = list(Intercept =..., id1 = ...) (leaving out id2 leads it to start it with zeros).

For hyperparameter values, this approach doesn't quite work. If the main reason you're setting initial values are the hyperparameters, then you can instead modify the previous result object before using it, setting fit1$mode$x <- NULL so that it forgets about the latent mode. However, you'd then also need to modify fit1$mode$theta to include the new parameter, in the correct order.

In your text, you indicate that you're changing the size of the components, and not adding new components, but in that case the same thing applies; the initial value setting needs to be compatible with the new model; the good news is that if there aren't any new components, then the $theta vector doesn't need pre-updating.

finnlindgren commented 1 year ago

Regarding the keep=TRUE issue, INLA shouldn't keep the files; the default is keep=FALSE. Which files are being left behind?

osupplisson commented 1 year ago

Hi Finn,

Thank you very much for your help on this, I am now able to do what I wanted ! The model is currently working for 1 of my 5 outcomes :-).

Regarding the keep option, the INLA package says:

keep: Keep temporary files?

Therefore, I was expecting that setting keep = FALSE (either in bru() or as global options) would delete the temporary files between each iteration from the R temp/INLA working directory. However, temporary files are accumulating during bru() run, so I assume I am missing something :

image

Best, Olivier

finnlindgren commented 1 year ago

Yes, keep=FALSE should remove all the temporary files, so that sounds like a possible problem with INLA on your system. inlabru isn't involved in those temporary files; that's all in INLA::inla(). With keep=TRUE, the temporary folders are usually called inla.model and inla.model-1, etc, so I'm not sure if your -1, -2 etc indicate a problem with the folder/file naming or not; e.g. if it creates -1 but tries to remove inla.model-1, but I don't see how that would happen. If you run a basic inla() model with and without keep=TRUE, what happens?

fit1 <- inla(y ~ 1, family = "gaussian", data = data.frame(y = rnorm(10)), verbose = TRUE, keep = FALSE)
fit2 <- inla(y ~ 1, family = "gaussian", data = data.frame(y = rnorm(10)), verbose = TRUE, keep = TRUE)
osupplisson commented 1 year ago

In my code, I specified the INLA working.directory as global option, with the expectation that my INLA temporary files would go to a specific folder before being deleted.

It seems that keep = FALSE is overwritten and set by default to keep=TRUE whenever inla.setOption("working.directory", path_to_inla_directory) is set or working.directory = path_to_inla_directory is set in the call.

Your two calls have the expected behavior, but the call below doesn't remove the files at the end of the run for me:

fit1 <- inla(y ~ 1,
             family = "gaussian",
             data = data.frame(y = rnorm(10)), 
             working.directory = "inla.model/",
             verbose = TRUE,
             keep = FALSE)

This is indeed an INLA behavior and has nothing to do with inlabru...

I think the issue can be closed! Thank you very much once again.

Best, Olivier

finnlindgren commented 1 year ago

Yes, if working.directory is given, it overrides the keep option. That's documented in ?inla but not in ?inla.setOption, so the documentation can be improved.