Open njtierney opened 3 years ago
Just to add: we tried an IID (random intercept) version of this model adding new groups, and that worked as expected, sampling the new group values from the distribution. I see that that IID out of sample prediction is documented in predict.bru()
.
Mostly hoping there's a trick to doing this with more complicated models like this group AR1 and that we won't have to extract components and code it up ourselves :)
Out-of-sample prediction on the observation level is much easier than out-of-sample prediction for the latent field model.
INLA doesn't really implement observation level prediction at all, but in inlabru we can include it in the Monte Carlo part of the predict method without problem.
For predictions involving latent field components that were not part of the original model specification, the code would need to somehow determine how to automatically extend the definition domain and precision matrix structure for new nodes. This is not a generally well-defined problem!
The practical problem with auto-extending the latent components has two main issues:
mapper
argument can partially automate this) A couple of months ago, support for extending "iid" models to unseen indices was added to inlabru, since for that model there is really only one natural extension, and it's a really easy one to implement. For other models, even those with relatively natural extension, the setup needed is more complicated. A potential future solution is if someone has time to implement a sub-interface to inla for just constructing the latent model information as raw precision matrices. However, there are many features that would cause complications. E.g. for intrinsic random field models, it is extremely common to impose sum-to-zero or integrate-to-zero constraints, and these would also need to be handled (to avoid changing the meaning of the model they should apply only to the original nodes).
TL;DR; The easiest general solution (both in INLA and in inlabru) is to include the entire domain in the original latent model definition, covering both the observations and the domain intended for prediction.
The "prediction gave us a repeated single value" effect is due to how the default mapper
information is constructed internally. In the future, this may be tweaked a bit to generate warnings/errors in these situations, to force the user to explicitly ask for that behaviour.
Thanks Finn, that makes sense and I appreciate the detailed response!
For our purposes, we could try to set up the right simulation for grouped AR1 models (either on our end or as a PR to inlabru). But it may be a lot easier to have the users specify the full set of groups and timesteps in advance. Easier is probably what we should go with for this project.
Even if you only need it for group-ar1 models, a general solution would still need to deal with the whole set of main models. You can likely do it for a specific combination of main model with group-ar1, but the general case essentially would need to replicate a large chunk of the internal INLA code for setting up the precision matrices etc... Much easier to define the initial model to include the prediction domain; this also would leave no questions about the intended model definition. It would potentially involve handling constraints; this is one of the issues with intrinsic stationary random field models; very few people seem to realise the full impact of the intricate theory underpinning those models... But if you don't have any intrinsic stationary models that need extending, you should be fine. A regular Matern spde field is fine, as is ar1, for example, but rw1 and rw2 require more care.
If you have other feature requests/thoughts, please let me know. I will be doing some work on multi-likelihood models later this summer, as well as some other details to allow access to the full range of INLA models. This will involve extending the interface a bit, so it would be good to be aware of potential needs when designing the details of that. I tend to work from an interface design point of view, such as "I'd like to be able to specify this feature in this way, but also provide a coherent interface design that doesn't contradict itself"; the latest CRAN updates were all about changing the internal structure of inlabru to allow more extensions.
A potential route to a general solution to this problem could use the same idea as we've been thinking of for related problems such as sampling from the priors; To take a defined inlabru model and adjust aspects of it before calling inla to do the actual computations. This could in principle be done for the "extend the latent models in a specific way"; practically, one could then first do the estimation using the smaller model, then extend it, and run just the final prediction stage of inla (since the posterior mode would already be known). This may be the easiest way to automate this for general models, but requires some knowledge of the inlabru and inla model specification and input/output storage formats.
This topic has come up for me when writing a linear mixed model example for some inlabru materials I am writing. In the example we have observed groups 1:5
with a simple iid
random effect model. I want to include a way to predict for unobserved group 6
. One option is to sample the posterior hyperparameters for the effect and generate posterior samples "by hand" as it were.
Or, perhaps this is easier, I could recommend augmenting the data used for fitting by including rows with NA response data corresponding to the unobserved prediction groups. If you do this then you get access to the INLA marginals for the unobserved groups, and you can sample them using generate()
as though they had been observed. This seems cleaner to me than samping hyper parameters and going from there, it's straightforward for the iid
or ar1
model but I can see it getting complicated very quickly.
Hi there,
We (@goldingn and myself) were wondering if and how inlabru handles out of sample prediction - so what happens if we get new groups added to the data and try and predict to them.
The prediction works, however, the predicted values we get appear to be identical to the last group in the series? We were wondering if perhaps there is an issue with how we've written the prediction, or if inla/inlabru doesn't do out of sample prediction?
simulate data
The two new groups appear to be identical to the last one ?
Created on 2021-06-14 by the reprex package (v2.0.0)
Session info
``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.0 (2021-05-18) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Perth #> date 2021-06-14 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0) #> broom 0.7.6 2021-04-05 [1] CRAN (R 4.1.0) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0) #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.1.0) #> colorspace 2.0-1 2021-05-04 [1] CRAN (R 4.1.0) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0) #> curl 4.3.1 2021-04-30 [1] CRAN (R 4.1.0) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0) #> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0) #> dplyr * 1.0.6 2021-05-05 [1] CRAN (R 4.1.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0) #> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0) #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.1.0) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) #> haven 2.4.1 2021-04-23 [1] CRAN (R 4.1.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> hms 1.1.0 2021-05-17 [1] CRAN (R 4.1.0) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0) #> INLA 21.05.02 2021-05-03 [1] local #> inlabru * 2.3.1 2021-03-22 [1] CRAN (R 4.1.0) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0) #> labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0) #> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.0) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0) #> lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.0) #> MatrixModels 0.5-0 2021-03-02 [1] CRAN (R 4.1.0) #> mime 0.10 2021-02-13 [1] CRAN (R 4.1.0) #> mnormt 2.0.2 2020-09-01 [1] CRAN (R 4.1.0) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.1.0) #> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.1.0) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.1.0) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0) #> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> rvest 1.0.0 2021-03-09 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> sn 2.0.0 2021-03-28 [1] CRAN (R 4.1.0) #> sp * 1.4-5 2021-01-10 [1] CRAN (R 4.1.0) #> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.1.0) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.1.0) #> tibble * 3.1.2 2021-05-16 [1] CRAN (R 4.1.0) #> tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.1.0) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) #> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0) #> tmvnsim 1.0-2 2016-12-15 [1] CRAN (R 4.1.0) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) #> xfun 0.23 2021-05-15 [1] CRAN (R 4.1.0) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```