inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
76 stars 21 forks source link

Two issues with 'besag' model #47

Closed tmeeha closed 3 years ago

tmeeha commented 5 years ago

First, thanks for all your hard work!!! Second, I am trying to use bru for a besag model, specifically a besag spatially varying coefficient model. Here are some bug observations.

First, don't name your graph 'g' when using bru(). It breaks when you do this. I think because there is a function named g within inlabru.

Second, looks like, even when using a different graph name, bru doesn't know what to do with the second weighting variable argument that is necessary in the model formula when you are specifying a 'besag' spatially varying coefficient model.

Best, Tim

Reproducible example:

start -----------------

library(INLA) library(inlabru) library(spdep) data(columbus)

adj matrix

g <- W <- as(as_dgRMatrix_listw(nb2listw(col.gal.nb)), "CsparseMatrix")

poly id

columbus$idx<-1:nrow(columbus)

inla works

form<- CRIME ~ INC + HOVAL + f(idx, model="besag", graph=g) inla1 <- inla(form, data=columbus) summary(inla1)

bru breaks when graph is named 'g'

form <- CRIME ~ INC + HOVAL + f(idx, model="besag", graph=g) bru1 <- bru(form, data=columbus, family="gaussian")

bru works when graph is not named 'g'

form <- CRIME ~ INC + HOVAL + location(map=idx, model="besag", graph=W) bru2 <- bru(form, data=columbus, family="gaussian") summary(bru2)

inla doesn't work after bru is called with graph 'g'

form<- CRIME ~ INC + HOVAL + f(idx, model="besag", graph=g) inla2 <- inla(form, data=columbus)

start over ---------------

rm(list=ls()) library(INLA) library(inlabru) library(spdep) data(columbus)

adj matrix

g <- W <- as(as_dgRMatrix_listw(nb2listw(col.gal.nb)), "CsparseMatrix")

poly id

columbus$idx<-1:nrow(columbus)

inla works for spatially varying slopes

form<- CRIME ~ INC + HOVAL + f(idx, INC, model="besag", graph=g) inla3 <- inla(form, data=columbus) summary(inla3)

bru doesn't work with SVC, even with graph not named 'g'

form <- CRIME ~ INC + HOVAL + f(idx, INC, model="besag", graph=W) bru3 <- bru(form, data=columbus, family="gaussian")

finnlindgren commented 5 years ago

[Copying in my reply from the r-inla list so I can remember and test it later]

yes, in the current inlabru version there is a "g" function with similar purpose as the INLA::f function that will interfere with component naming. I believe the experimental backend changes will remove this issue.

Instead of specifying f(..., weight), the idea in inlabru is to include it explicitly in the formula expression, and the formula is distinct from the model component specification (but in simple cases, the formula is inferred from the component spec). In the absence of other bugs, the model would be written as follows:

comp <- ~ INC + HOVAL + field(map = idx, model="besag", graph=W, n=the_graph_size) form <- CRIME ~ INC + HOVAL + field * INC bru3 <- bru(comp, like(data=columbus, family="gaussian", formula = form))

tmeeha commented 5 years ago

Thanks! I tried this:

start

library(INLA) library(inlabru) library(spdep) data(columbus)

adj matrix

W <- as(as_dgRMatrix_listw(nb2listw(col.gal.nb)), "CsparseMatrix")

poly id

columbus$idx<-1:nrow(columbus)

new SVC specification suggested

W_size <- 49 comp <- ~ INC + HOVAL + field(map=idx, model="besag", graph=W, n=W_size) form <- CRIME ~ INC + HOVAL + field * INC bru4 <- bru(comp, like(data=columbus, family="gaussian", formula=form))

But I get the error:

error

Error in (function (data, model, stackmaker, n = 10, result = NULL, family, : INLA returned message: Model component 'f(Intercept, ...)' have only NA values in 'Intercept'. In this case, either argument 'n' or 'values' must be spesified. In addition: Warning message: In (function (formula, family = "gaussian", contrasts = NULL, data, : Model component 'f(Intercept, ...)' have only NA values in 'Intercept'. In this case, either argument 'n' or 'values' must be spesified.

So I added an explicit intercept to the formula and it ran:

new SVC specification suggested - add Intercept

W_size <- 49 comp <- ~ INC + HOVAL + field(map=idx, model="besag", graph=W, n=W_size) form <- CRIME ~ Intercept + INC + HOVAL + field * INC bru5 <- bru(comp, like(data=columbus, family="gaussian", formula=form)) summary(bru5)

But the bru5 object doesn't appear to have estimates for the field * INC interaction anywhere.

look at bru5

summary(bru5) --- Fixed effects -------------------------------------------------------------------------------- mean sd 0.025quant 0.5quant 0.975quant mode INC -1.5387535 0.3328623 -2.1913414 -1.5400300 -0.87971011 -1.5424498 HOVAL -0.2622684 0.1031129 -0.4649008 -0.2625036 -0.05857286 -0.2629377 Intercept 67.1557612 4.6924959 57.8176413 67.1857116 76.31637065 67.24303 --- Random effects ------------------------------------------------------------------------------- field ranges: mean = [-0.00144106, 0.00644126], sd = [0.00894521, 0.0322305], quantiles = [-0.040069 : 0.084582] --- All hyper parameters (internal representation) ----------------------------------------------- mean sd 0.025quant Precision for the Gaussian observations 7.962429e-03 1.627073e-03 5.167252e-03 Precision for field 1.795707e+04 1.738944e+04 1.165127e+03 0.5quant 0.975quant mode Precision for the Gaussian observations 7.835242e-03 1.152239e-02 7.603452e-03 Precision for field 1.283550e+04 6.388724e+04 3.155054e+03

tmeeha commented 5 years ago

By the way, here is my setup:

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Platform: x86_64-w64-mingw32/x64 (64-bit) INLA_18.10.09 built 2018-10-10 inlabru, 2.1.9.999

finnlindgren commented 5 years ago

The automatic intercept seems to not work in this case; I'm a bit unsure of wha t the full reason for that is, but it's easy to rectify by adding "+ Intercept" to both the components and the formula.

A second issue is that since INC was used as the name of the effect of INC as well as used to scale the "field" component in the formula, the formula used the effect of INC instead of the raw data. This can be corrected by using a different name for the effect, so that the effect INC[i] * beta_inc of INC can be used at the same time as the covariate values INC[i].

This complication arises because inlabru interprets the formula expression almost like a regular R expression. The estimated parameters will have the same name as the effect, which is in itself an issue, but it's an issue with all common GLM software. Ideally, we would implement a simple standardised way to refer to the covariate, the parameter, and the combined effect, with different names without having to explicitly define all three names.

The resulting code and output (generated with the reprex::reprex() function):

library(INLA)
#> Loading required package: Matrix
#> Loading required package: sp
#> This is INLA_18.08.09 built 2018-08-20 14:23:47 UTC.
#> See www.r-inla.org/contact-us for how to get help.
#> To enable PARDISO sparse library; see inla.pardiso()
library(inlabru)
#> Loading required package: ggplot2
library(spdep)
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source'))`
data(columbus)
# adj matrix
W <- as(as_dgRMatrix_listw(nb2listw(col.gal.nb)), "CsparseMatrix")
# poly id
columbus$idx<-1:nrow(columbus)

W_size <- 49
comp <- ~ INCeff(map = INC, model="linear") + HOVAL +
  field(map=idx, model="besag", graph=W) + Intercept
form <- CRIME ~ INCeff + HOVAL + field * INC + Intercept
bru4 <- bru(comp, like(data=columbus, family="gaussian", formula=form))
#> 
#> Max deviation from previous: 0.00594% of SD [stop if: <1%]
#> Convergence criterion met, stopping INLA iteration.
INLA:::summary.inla(bru4)
#> 
#> Call:
#> "(function (formula, family = \"gaussian\", contrasts = NULL, data,      quantiles = c(0.025, 0.5, 0.975), E = NULL, offset = NULL,      scale = NULL, weights = NULL, Ntrials = NULL, strata = NULL,      link.covariates = NULL, verbose = FALSE, lincomb = NULL,      control.compute = list(), control.predictor = list(), control.family = list(),      control.inla = list(), control.results = list(), control.fixed = list(),      control.mode = list(), control.expert = list(), control.hazard = list(),      control.lincomb = list(), control.update = list(), only.hyperparam = FALSE,      inla.call = inla.getOption(\"inla.call\"), inla.arg = inla.getOption(\"inla.arg\"),      num.threads = inla.getOption(\"num.threads\"), blas.num.threads = inla.getOption(\"blas.num.threads\"),      keep = inla.getOption(\"keep\"), working.directory = inla.getOption(\"working.directory\"),      silent = inla.getOption(\"silent\"), debug = inla.getOption(\"debug\"),      .parent.frame = parent.frame())  {     old.options = options()     on.exit(options(old.options))     options(OutDec = \".\")     my.time.used = numeric(4)     my.time.used[1] = Sys.time()     all.hyper = list()     data.orig = data     if (missing(formula)) {         stop(\"Usage: inla(formula, family, data, ...); see ?inla\\n\")     }     if (missing(data)) {         stop(\"Missing data.frame/list `data'. Leaving `data' empty might lead to\\n\\t\\tuncontrolled behaviour, therefore is it required.\")     }     if (!is.data.frame(data) && !is.list(data)) {         stop(\"\\n\\tArgument `data' must be a data.frame or a list.\")     }     if (!missing(weights) && !is.null(weights)) {         if (!inla.getOption(\"enable.inla.argument.weights\")) {             stop(paste(\"Argument 'weights' must be enabled before use due to the risk of mis-interpreting the results.\\n\",                  \"\\tUse 'inla.setOption(\\\"enable.inla.argument.weights\\\", TRUE)' to enable it; see ?inla\"))         }     }     data.model = NULL     if (is.list(data) && length(data) > 0L) {         if (any(nchar(names(data)) == 0L)) {            <<<the rest is not shown>>>"
#> 
#> Time used:
#>  Pre-processing    Running inla Post-processing           Total 
#>          0.8793          0.5529          0.9599          2.3921 
#> 
#> Fixed effects:
#>              mean     sd 0.025quant 0.5quant 0.975quant    mode kld
#> INCeff     0.1456 0.3992    -0.6366   0.1440     0.9363  0.1408   0
#> HOVAL     -0.5856 0.1130    -0.8074  -0.5859    -0.3623 -0.5865   0
#> Intercept 57.8242 5.0938    47.6931  57.8548    67.7736 57.9134   0
#> 
#> Random effects:
#> Name   Model
#>  field   Besags ICAR model 
#> 
#> Model hyperparameters:
#>                                              mean        sd 0.025quant
#> Precision for the Gaussian observations 1.902e+04 1.894e+04  1303.2655
#> Precision for field                     2.563e-01 5.230e-02     0.1659
#>                                          0.5quant 0.975quant      mode
#> Precision for the Gaussian observations 1.342e+04  6.946e+04 3572.0637
#> Precision for field                     2.524e-01  3.706e-01    0.2453
#> 
#> Expected number of effective parameters(std dev): 49.00(0.0014)
#> Number of equivalent replicates : 1.00 
#> 
#> Deviance Information Criterion (DIC) ...............: -223.40
#> Deviance Information Criterion (DIC, saturated) ....: 171.84
#> Effective number of parameters .....................: 85.92
#> 
#> Watanabe-Akaike information criterion (WAIC) ...: -244.37
#> Effective number of parameters .................: 50.22
#> 
#> Marginal log-Likelihood:  -255.40 
#> Posterior marginals for linear predictor and fitted values computed
tmeeha commented 5 years ago

Thanks for the help, Finn. Here is a tidy reproducible example for those wishing to run a CAR SVC model.

# libraries
suppressMessages(suppressWarnings(require(INLA)))
suppressMessages(suppressWarnings(require(brinla)))
suppressMessages(suppressWarnings(require(inlabru)))
suppressMessages(suppressWarnings(require(spdep)))
suppressMessages(suppressWarnings(require(sf)))
suppressMessages(suppressWarnings(require(rgdal)))

# demo data from spdep package
data(columbus)

# weights matrix for besag model
weights1 <- as(as_dgRMatrix_listw(nb2listw(col.gal.nb)), "CsparseMatrix")

# region index
columbus$index1<-1:nrow(columbus)

# make bru components
comp1 <- ~
  random_intercepts(map=index1, model="besag", graph=weights1, scale.model=T) +
  random_inc_slopes(map=index1, model="besag", graph=weights1, scale.model=T) +
  Intercept

# make bru formula with fixed intercept and random intercepts and slopes
form1 <- CRIME ~ -1 + random_intercepts + random_inc_slopes * INC + Intercept

# get bru result
bru1 <- bru(comp1, like(data=columbus, family="gaussian", formula=form1))
#> 
#> Max deviation from previous: 0.115% of SD [stop if: <1%]
#> Convergence criterion met, stopping INLA iteration.

# access results
bru1$summary.fixed
#>               mean          sd 0.025quant 0.5quant 0.975quant     mode kld
#> Intercept 39.84555 0.005403898   39.83493 39.84555   39.85616 39.84555   0

head(bru1$summary.random$random_intercepts)
#>   ID          mean          sd  0.025quant      0.5quant 0.975quant
#> 1  1 -1.945376e-05 0.015708241 -0.03350364 -1.279404e-05 0.03339562
#> 2  2 -2.024220e-05 0.013915304 -0.02968752 -1.306814e-05 0.02957546
#> 3  3 -1.715220e-05 0.012218184 -0.02606516 -1.111626e-05 0.02597015
#> 4  4 -2.471269e-05 0.010990003 -0.02346968 -1.535046e-05 0.02333371
#> 5  5 -5.075567e-06 0.007916265 -0.01687181 -3.722084e-06 0.01684310
#> 6  6 -4.332633e-06 0.011600394 -0.02471550 -3.664234e-06 0.02469032
#>            mode          kld
#> 1 -5.397210e-06 1.327386e-09
#> 2 -5.616423e-06 1.327274e-09
#> 3 -4.758988e-06 1.327288e-09
#> 4 -6.858770e-06 1.326854e-09
#> 5 -1.407826e-06 1.327644e-09
#> 6 -1.201519e-06 1.327846e-09

head(bru1$summary.random$random_inc_slopes)
#>   ID       mean          sd 0.025quant   0.5quant 0.975quant       mode
#> 1  1 -1.1837348 0.002644563 -1.1889333 -1.1837349 -1.1785409 -1.1837350
#> 2  2 -0.9440361 0.002592211 -0.9491288 -0.9440362 -0.9389477 -0.9440362
#> 3  3 -0.5150885 0.002656598 -0.5203110 -0.5150887 -0.5098709 -0.5150887
#> 4  4 -1.4424201 0.004172107 -1.4507782 -1.4424220 -1.4340520 -1.4424244
#> 5  5  1.0563355 0.002736137  1.0509522  1.0563354  1.0617128  1.0563356
#> 6  6 -0.7972328 0.002638272 -0.8024182 -0.7972329 -0.7920522 -0.7972328
#>            kld
#> 1 0.000000e+00
#> 2 0.000000e+00
#> 3 0.000000e+00
#> 4 4.273495e-10
#> 5 2.727023e-18
#> 6 6.053109e-19

bri.hyperpar.summary(bru1)
#>                                        mean          sd      q0.025
#> SD for the Gaussian observations 0.00993358 0.005726301 0.003684001
#> SD for random_intercepts         0.01000307 0.005840183 0.003635483
#> SD for random_inc_slopes         1.79514189 0.180347608 1.478137626
#>                                         q0.5     q0.975        mode
#> SD for the Gaussian observations 0.008343955 0.02527102 0.006205901
#> SD for random_intercepts         0.008382984 0.02564366 0.006247523
#> SD for random_inc_slopes         1.781097078 2.18519663 1.747918996

Here is a bit more code for graphics.

# plot results
plot(bru1, "Intercept")
plot(bru1, "random_intercepts")
plot(bru1, "random_inc_slopes")

# map values
columbus_shape <- as(readOGR(system.file("shapes/columbus.shp", package="spData")[1]), "sf")
columbus_shape$rand_slopes_mean <- bru1$summary.random$random_inc_slopes$mean
columbus_shape$rand_slopes_sd <- bru1$summary.random$random_inc_slopes$sd
plot(columbus_shape["rand_slopes_mean"])
plot(columbus_shape["rand_slopes_sd"])

Created on 2018-10-14 by the reprex package (v0.2.1.9000)

finnlindgren commented 3 years ago

Going through old issues. In the final example, there's a "-1" in the predictor expression that shouldn't be there. It could be in the component definition, but -1 + Intercept(1) is equivalent to +1 or just +Intercept(1) there. Since at least 2.1.15, inlabru now accepts the same scaling weight specification as inla, with the second parameter allowing optional weights (here INC). That also allows setting the formula to a shortcut "add all components", which also tells inlabru it's a pure additive model with no nonlinearities. Both versions are shown below:

Updated example code:

library(inlabru)
#> Loading required package: ggplot2
#> Loading required package: sp

suppressPackageStartupMessages(library(spatialreg))

data(columbus, package = "spdep")

# weights matrix for besag model
weights1 <- as(spatialreg::as_dgRMatrix_listw(spdep::nb2listw(col.gal.nb)),
               "CsparseMatrix")

# region index
columbus$index1<-1:nrow(columbus)

# make bru components
comp1 <- ~
  random_intercepts(index1, model="besag", graph=weights1, scale.model=TRUE) +
  random_inc_slopes(index1, model="besag", graph=weights1, scale.model=TRUE) +
  Intercept(1)

# make bru formula with fixed intercept and random intercepts and slopes
form1 <- CRIME ~ random_intercepts + random_inc_slopes * INC + Intercept

# get bru result
bru1 <- bru(comp1, like(data=columbus, family="gaussian", formula=form1))
#> Loading required namespace: INLA

summary(bru1)
#> inlabru version: 2.1.15.9000
#> INLA version: 20.12.10
#> Components:
#>   random_intercepts: Model types main='besag', group='exchangeable', replicate='iid'
#>   random_inc_slopes: Model types main='besag', group='exchangeable', replicate='iid'
#>   Intercept: Model types main='linear', group='exchangeable', replicate='iid'
#> Likelihoods:
#>   Family: 'gaussian'
#>     Data class: 'data.frame'
#>     Predictor: CRIME ~ random_intercepts + random_inc_slopes * INC + Intercept
#> Time used:
#>     Pre = 0.297, Running = 0.132, Post = 0.0394, Total = 0.468 
#> Fixed effects:
#>             mean    sd 0.025quant 0.5quant 0.975quant   mode kld
#> Intercept 38.846 0.005     38.835   38.846     38.856 38.846   0
#> 
#> Random effects:
#>   Name     Model
#>     random_intercepts Besags ICAR model
#>    random_inc_slopes Besags ICAR model
#> 
#> Model hyperparameters:
#>                                             mean       sd 0.025quant 0.5quant
#> Precision for the Gaussian observations 21093.27 2.05e+04    1673.36 1.52e+04
#> Precision for random_intercepts         19793.99 1.86e+04    1485.94 1.44e+04
#> Precision for random_inc_slopes             0.32 6.40e-02       0.21 3.14e-01
#>                                         0.975quant     mode
#> Precision for the Gaussian observations   7.51e+04 4702.946
#> Precision for random_intercepts           6.93e+04 4146.068
#> Precision for random_inc_slopes           4.61e-01    0.304
#> 
#> Expected number of effective parameters(stdev): 49.00(0.00)
#> Number of equivalent replicates : 1.00 
#> 
#> Deviance Information Criterion (DIC) ...............: -284.43
#> Deviance Information Criterion (DIC, saturated) ....: 114.10
#> Effective number of parameters .....................: 57.05
#> 
#> Watanabe-Akaike information criterion (WAIC) ...: -294.21
#> Effective number of parameters .................: 36.53
#> 
#> Marginal log-Likelihood:  -253.31 
#> Posterior marginals for the linear predictor and
#>  the fitted values are computed
#> 2.1.15.9000
#> 20.12.10
#> random_intercepts
#> besag
#> exchangeable
#> iid
#> random_inc_slopes
#> besag
#> exchangeable
#> iid
#> Intercept
#> linear
#> exchangeable
#> iid
#> gaussian
#> data.frame
#> CRIME ~ random_intercepts + random_inc_slopes * INC + Intercept

Using INC as internal weights in the component:

library(inlabru)
#> Loading required package: ggplot2
#> Loading required package: sp

suppressPackageStartupMessages(library(spatialreg))

data(columbus, package = "spdep")

# weights matrix for besag model
weights1 <- as(spatialreg::as_dgRMatrix_listw(spdep::nb2listw(col.gal.nb)),
               "CsparseMatrix")

# region index
columbus$index1<-1:nrow(columbus)

# make bru components
comp1 <- ~
  random_intercepts(index1, model="besag", graph=weights1, scale.model=TRUE) +
  random_inc_slopes(index1, INC, model="besag", graph=weights1, scale.model=TRUE) +
  Intercept(1)

# make bru formula with fixed intercept and random intercepts and slopes
# form1 <- CRIME ~ random_intercepts + random_inc_slopes + Intercept
# Since we now just want the additive model of all components, use a shortcut:
form1 <- CRIME ~ .

# get bru result
bru1 <- bru(comp1, like(data=columbus, family="gaussian", formula=form1))
#> Loading required namespace: INLA

summary(bru1)
#> inlabru version: 2.1.15.9000
#> INLA version: 20.12.10
#> Components:
#>   random_intercepts: Model types main='besag', group='exchangeable', replicate='iid'
#>   random_inc_slopes: Model types main='besag', group='exchangeable', replicate='iid'
#>   Intercept: Model types main='linear', group='exchangeable', replicate='iid'
#> Likelihoods:
#>   Family: 'gaussian'
#>     Data class: 'data.frame'
#>     Predictor: CRIME ~ .
#> Time used:
#>     Pre = 0.421, Running = 0.298, Post = 0.0465, Total = 0.766 
#> Fixed effects:
#>             mean    sd 0.025quant 0.5quant 0.975quant   mode kld
#> Intercept 38.846 0.005     38.835   38.846     38.856 38.846   0
#> 
#> Random effects:
#>   Name     Model
#>     random_intercepts Besags ICAR model
#>    random_inc_slopes Besags ICAR model
#> 
#> Model hyperparameters:
#>                                             mean       sd 0.025quant 0.5quant
#> Precision for the Gaussian observations 2.04e+04 2.08e+04   1447.697 1.42e+04
#> Precision for random_intercepts         1.82e+04 1.80e+04   1242.190 1.29e+04
#> Precision for random_inc_slopes         3.19e-01 6.40e-02      0.209 3.14e-01
#>                                         0.975quant     mode
#> Precision for the Gaussian observations   7.55e+04 3974.118
#> Precision for random_intercepts           6.61e+04 3401.673
#> Precision for random_inc_slopes           4.59e-01    0.305
#> 
#> Expected number of effective parameters(stdev): 49.00(0.00)
#> Number of equivalent replicates : 1.00 
#> 
#> Deviance Information Criterion (DIC) ...............: -280.89
#> Deviance Information Criterion (DIC, saturated) ....: 114.96
#> Effective number of parameters .....................: 57.48
#> 
#> Watanabe-Akaike information criterion (WAIC) ...: -289.74
#> Effective number of parameters .................: 37.64
#> 
#> Marginal log-Likelihood:  -253.39 
#> Posterior marginals for the linear predictor and
#>  the fitted values are computed
#> 2.1.15.9000
#> 20.12.10
#> random_intercepts
#> besag
#> exchangeable
#> iid
#> random_inc_slopes
#> besag
#> exchangeable
#> iid
#> Intercept
#> linear
#> exchangeable
#> iid
#> gaussian
#> data.frame
#> CRIME ~ .

Created on 2020-12-10 by the reprex package (v0.3.0)