inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
78 stars 21 forks source link

predict with model = "bym" #131

Closed ASeatonSpatial closed 2 years ago

ASeatonSpatial commented 2 years ago

I noticed a potential bug when predicting from a "bym" model. See a reprex below which requires the SpatialEpi package that has some areal unit data.

If predicting without supplying new data, the predictions seem sensible. However, if passing a data object to the predict() function, it doesn't seem to return a prediction dataframe with sensible dimensions. I wonder if something is maybe breaking with the indexing since a bym model is a bit different, the effect is of length 2n, where n is number of nodes in the graph. If predicting without a data object it's possible to just subset this as required as I do below.

# Fit BYM model to Scotland lip cancer dataset
library(INLA)
#> Warning: package 'INLA' was built under R version 4.1.2
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loading required package: parallel
#> Loading required package: sp
#> This is INLA_21.11.22 built 2021-11-21 16:10:15 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.
#>  - To enable PARDISO sparse library; see inla.pardiso()
library(inlabru)
library(ggplot2)
library(patchwork)
library(spdep)
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(SpatialEpi)

# load data
data(scotland)
lip_data = scotland$spatial.polygon
lip_data$cases = scotland$data$cases
lip_data$county.names = scotland$geo$county.names
lip_data$county.id = 1:nrow(lip_data)
lip_data$E = scotland$data$expected   # exposure
lip_data = spTransform(lip_data,
                       CRS("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=km +no_defs"))
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

# adjacency matrix
nb = poly2nb(lip_data)
nb2INLA("scot.adj", nb)
g = inla.read.graph(filename = "scot.adj")
summary(g)
#>  n =  56 
#>  ncc =  4 
#>  nnbs = (names)   0  1  2  3  4  5  6  7  8  9 11 
#>         (count)   3  1  6 11 14  8  8  1  1  2  1

# try BYM model

cmp = cases ~ Intercept(1) + 
  bym_eff(county.id,
          model = "bym",
          graph = g,
          scale.model = TRUE)

fit = bru(components = cmp,
          data = lip_data,
          E = lip_data$E,
          family = "poisson")

# predict
pred = predict(fit, formula = ~ exp(Intercept_latent + bym_eff_latent))
nrow(lip_data)
#> [1] 56
nrow(pred)
#> [1] 112

# take first 56 which is u + v (see inla.doc("bym"))
lip_data$mean_pred_bym = pred$mean[1:56] * lip_data$E

vals = c(lip_data$cases, lip_data$mean_pred_bym)
p1 = ggplot() + 
  gg(lip_data, 
     aes(fill = mean_pred_bym),
     alpha = 1) +
  scale_fill_viridis_c(limits = c(min(vals), max(vals))) + 
  theme_minimal() +
  coord_equal()
#> Regions defined for each Polygons

p2 = ggplot() + 
  gg(lip_data, 
     aes(fill = cases),
     alpha = 1) +
  scale_fill_viridis_c(limits = c(min(vals), max(vals))) + 
  theme_minimal() +
  coord_equal()
#> Regions defined for each Polygons

p1 + p2


# try passing data to predict call
pred = predict(fit, 
               data = lip_data,
               formula = ~ exp(Intercept + bym_eff))
nrow(lip_data)
#> [1] 56
nrow(pred)
#> [1] 2

# only 2?

# try a data frame
pred_df = data.frame(county.id = 1:10)
pred = predict(fit, 
               data = pred_df,
               formula = ~ exp(Intercept + bym_eff))

nrow(pred_df)
#> [1] 10
nrow(pred)
#> [1] 2

# still 2

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       Ubuntu 20.04.3 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB:en                    
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2022-01-12                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date       lib source                              
#>  backports      1.2.1      2020-12-09 [1] CRAN (R 4.0.3)                      
#>  boot           1.3-28     2021-05-03 [1] CRAN (R 4.0.5)                      
#>  cachem         1.0.6      2021-08-19 [1] CRAN (R 4.0.5)                      
#>  callr          3.7.0      2021-04-20 [1] CRAN (R 4.0.5)                      
#>  class          7.3-19     2021-05-03 [4] CRAN (R 4.0.5)                      
#>  classInt       0.4-3      2020-04-07 [1] CRAN (R 4.0.3)                      
#>  cli            3.1.0      2021-10-27 [1] CRAN (R 4.0.5)                      
#>  coda           0.19-4     2020-09-30 [1] CRAN (R 4.0.5)                      
#>  codetools      0.2-18     2020-11-04 [1] CRAN (R 4.0.3)                      
#>  colorspace     2.0-2      2021-06-24 [1] CRAN (R 4.0.5)                      
#>  crayon         1.4.2      2021-10-29 [1] CRAN (R 4.0.5)                      
#>  DBI            1.1.0      2019-12-15 [1] CRAN (R 4.0.3)                      
#>  deldir         0.2-3      2020-11-09 [1] CRAN (R 4.0.3)                      
#>  desc           1.4.0      2021-09-28 [1] CRAN (R 4.0.5)                      
#>  devtools       2.4.3      2021-11-30 [1] CRAN (R 4.0.5)                      
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.0.5)                      
#>  dplyr          1.0.7      2021-06-18 [1] CRAN (R 4.0.5)                      
#>  e1071          1.7-4      2020-10-14 [1] CRAN (R 4.0.3)                      
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.0.5)                      
#>  evaluate       0.14       2019-05-28 [1] CRAN (R 4.0.5)                      
#>  expm           0.999-5    2020-07-20 [1] CRAN (R 4.0.3)                      
#>  fansi          1.0.0      2022-01-10 [1] CRAN (R 4.0.5)                      
#>  farver         2.1.0      2021-02-28 [1] CRAN (R 4.0.5)                      
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.0.5)                      
#>  foreach      * 1.5.1      2020-10-15 [1] CRAN (R 4.0.3)                      
#>  foreign        0.8-81     2020-12-22 [1] CRAN (R 4.0.3)                      
#>  fs             1.5.0      2020-07-31 [1] CRAN (R 4.0.3)                      
#>  gdata          2.18.0     2017-06-06 [1] CRAN (R 4.0.3)                      
#>  generics       0.1.0      2020-10-31 [1] CRAN (R 4.0.5)                      
#>  ggplot2      * 3.3.5      2021-06-25 [1] CRAN (R 4.0.5)                      
#>  ggpolypath     0.1.0      2016-08-10 [1] CRAN (R 4.0.5)                      
#>  glue           1.6.0      2021-12-17 [1] CRAN (R 4.0.5)                      
#>  gmodels        2.18.1     2018-06-25 [1] CRAN (R 4.0.3)                      
#>  gtable         0.3.0      2019-03-25 [1] CRAN (R 4.0.5)                      
#>  gtools         3.8.2      2020-03-31 [1] CRAN (R 4.0.3)                      
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.0.5)                      
#>  htmltools      0.5.2.9000 2021-09-06 [1] Github (rstudio/htmltools@6a03c3f)  
#>  INLA         * 21.11.22   2021-11-21 [1] local                               
#>  inlabru      * 2.4.0.9000 2022-01-12 [1] Github (inlabru-org/inlabru@1f12ded)
#>  iterators      1.0.13     2020-10-15 [1] CRAN (R 4.0.3)                      
#>  KernSmooth     2.23-20    2021-05-03 [4] CRAN (R 4.0.5)                      
#>  knitr          1.33       2021-04-24 [1] CRAN (R 4.0.5)                      
#>  labeling       0.4.2      2020-10-20 [1] CRAN (R 4.0.3)                      
#>  lattice        0.20-41    2020-04-02 [1] CRAN (R 4.0.3)                      
#>  LearnBayes     2.15.1     2018-03-18 [1] CRAN (R 4.0.3)                      
#>  lifecycle      1.0.1      2021-09-24 [1] CRAN (R 4.0.5)                      
#>  magrittr       2.0.1      2020-11-17 [1] CRAN (R 4.0.3)                      
#>  maptools       1.0-2      2020-08-24 [1] CRAN (R 4.0.3)                      
#>  MASS           7.3-54     2021-05-03 [4] CRAN (R 4.0.5)                      
#>  Matrix       * 1.3-0      2020-12-22 [1] CRAN (R 4.0.3)                      
#>  MatrixModels   0.4-1      2015-08-22 [1] CRAN (R 4.0.3)                      
#>  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.0.5)                      
#>  mnormt         2.0.2      2020-09-01 [1] CRAN (R 4.0.3)                      
#>  munsell        0.5.0      2018-06-12 [1] CRAN (R 4.0.3)                      
#>  nlme           3.1-151    2020-12-10 [1] CRAN (R 4.0.3)                      
#>  numDeriv       2016.8-1.1 2019-06-06 [1] CRAN (R 4.0.3)                      
#>  patchwork    * 1.1.1      2020-12-17 [1] CRAN (R 4.0.5)                      
#>  pillar         1.6.4      2021-10-18 [1] CRAN (R 4.0.5)                      
#>  pkgbuild       1.3.1      2021-12-20 [1] CRAN (R 4.0.5)                      
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.0.3)                      
#>  pkgload        1.2.4      2021-11-30 [1] CRAN (R 4.0.5)                      
#>  plyr           1.8.6      2020-03-03 [1] CRAN (R 4.0.3)                      
#>  prettyunits    1.1.1      2020-01-24 [1] CRAN (R 4.0.5)                      
#>  processx       3.5.2      2021-04-30 [1] CRAN (R 4.0.5)                      
#>  ps             1.6.0      2021-02-28 [1] CRAN (R 4.0.5)                      
#>  purrr          0.3.4      2020-04-17 [1] CRAN (R 4.0.5)                      
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.0.5)                      
#>  raster         3.4-5      2020-11-14 [1] CRAN (R 4.0.3)                      
#>  Rcpp           1.0.7      2021-07-07 [1] CRAN (R 4.0.5)                      
#>  remotes        2.4.2      2021-11-30 [1] CRAN (R 4.0.5)                      
#>  reprex         2.0.1      2021-08-05 [1] CRAN (R 4.0.5)                      
#>  rgdal          1.5-28     2021-12-15 [1] CRAN (R 4.0.5)                      
#>  rgeos          0.5-9      2021-12-15 [1] CRAN (R 4.0.5)                      
#>  rlang          0.4.12     2021-10-18 [1] CRAN (R 4.0.5)                      
#>  rmarkdown      2.6        2020-12-14 [1] CRAN (R 4.0.3)                      
#>  rprojroot      2.0.2      2020-11-15 [1] CRAN (R 4.0.5)                      
#>  s2             1.0.7      2021-09-28 [1] CRAN (R 4.0.5)                      
#>  scales         1.1.1      2020-05-11 [1] CRAN (R 4.0.5)                      
#>  sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 4.0.3)                      
#>  sf           * 1.0-4      2021-11-14 [1] CRAN (R 4.0.5)                      
#>  sn             1.6-2      2020-05-26 [1] CRAN (R 4.0.3)                      
#>  sp           * 1.4-6      2021-11-14 [1] CRAN (R 4.0.5)                      
#>  SpatialEpi   * 1.2.7      2021-11-15 [1] CRAN (R 4.0.5)                      
#>  spData       * 0.3.8      2020-07-03 [1] CRAN (R 4.0.3)                      
#>  spdep        * 1.1-12     2021-11-09 [1] CRAN (R 4.0.5)                      
#>  stringi        1.6.2      2021-05-17 [1] CRAN (R 4.0.5)                      
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.0.5)                      
#>  styler         1.5.1      2021-07-13 [1] CRAN (R 4.0.5)                      
#>  testthat       3.1.1      2021-12-03 [1] CRAN (R 4.0.5)                      
#>  tibble         3.1.6      2021-11-07 [1] CRAN (R 4.0.5)                      
#>  tidyselect     1.1.1      2021-04-30 [1] CRAN (R 4.0.5)                      
#>  tmvnsim        1.0-2      2016-12-15 [1] CRAN (R 4.0.3)                      
#>  units          0.6-7      2020-06-13 [1] CRAN (R 4.0.3)                      
#>  usethis        2.1.5      2021-12-09 [1] CRAN (R 4.0.5)                      
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.0.5)                      
#>  vctrs          0.3.8      2021-04-29 [1] CRAN (R 4.0.5)                      
#>  viridisLite    0.4.0      2021-04-13 [1] CRAN (R 4.0.5)                      
#>  withr          2.4.3      2021-11-30 [1] CRAN (R 4.0.5)                      
#>  wk             0.5.0      2021-07-13 [1] CRAN (R 4.0.5)                      
#>  xfun           0.24       2021-06-15 [1] CRAN (R 4.0.5)                      
#>  yaml           2.2.1      2020-02-01 [1] CRAN (R 4.0.5)                      
#> 
#> [1] /home/andy/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

Created on 2022-01-12 by the reprex package (v2.0.1)

The desired behaviour I guess would be for predict(fit, data = pred_obj, ~ some_function(bym_eff)) to return an object with 2 times nrow(pred_obj) rows. This would apply some_function() to both the u + v and v parts of the bym_eff parameters.

finnlindgren commented 2 years ago

For bym_eff_latent, one should get the 2n length vector, as it's explicitly asking for the entire latent vector. For just the plain effect vector, it should give a vector of length n (the second half is hidden). With bym_eff_eval, one can choose to access arbitrary parts of the latent field, by indexing into either the first or second part, or both. The predict call that generates length 2 does seem to be a bug.

finnlindgren commented 2 years ago

The plain bym_eff is meant to act like it does in the estimation, giving a vector of length n, which is the effect "visible" in INLA.

ASeatonSpatial commented 2 years ago

Good to know bym_eff would just return a vector of length n.

So the desired behaviour would be nrow(pred) = 56 in this example?

# try passing data to predict call
pred = predict(fit, 
               data = lip_data,
               formula = ~ exp(Intercept + bym_eff))
nrow(lip_data)
#> [1] 56
nrow(pred)
#> [1] 2

And nrow(pred) = 10 in this one?

# try a data frame
pred_df = data.frame(county.id = 1:10)
pred = predict(fit, 
               data = pred_df,
               formula = ~ exp(Intercept + bym_eff))

nrow(pred_df)
#> [1] 10
nrow(pred)
#> [1] 2

But nrow(pred) = 2 in both.

finnlindgren commented 2 years ago

The problem is that inla_f=TRUE doesn't get set/propagated to ibm_amatrix.bru_mapper_collect when using generate, so it expects list input instead of just the first vector. A workaround is to use _eval with a list in the predict call:

pred = predict(fit,
               data = lip_data,
               formula = ~ exp(Intercept + bym_eff_eval(list(county.id))))
nrow(pred)
#> [1] 56

pred_df = data.frame(county.id = 1:10)
pred = predict(fit,
               data = pred_df,
               formula = ~ exp(Intercept + bym_eff_eval(list(county.id))))

nrow(pred)
#> [1] 10

I need to think about whether the inla_f logic is such that it should be activated in generate calls; it should probably be renamed if that's the case, or a separate mechanism added to keep the logic predictable.

finnlindgren commented 2 years ago

Use bym_eff_eval(list(v=...)) to access the hidden component separately.

finnlindgren commented 2 years ago

Bug resolved. But I see in the bym and bym2 documentation that the naming of the latent components doesn't match the current internal mapper names, (u,v), as the inla documentation uses (v+u,u). The inlabru mapper names should be changed to match the inla documentation, e.g. (vu,u)?