inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
78 stars 21 forks source link

Is the offset option working? #123

Closed sdaza closed 2 years ago

sdaza commented 2 years ago
> dat = data.table(
+     id = c(1:8),
+     deaths = c(5, 30, 2, 4, 5, 4, 7, 10),
+     pop = c(1000, 2300, 300, 400, 500, 700, 1000, 700)
+ )

# offset option
> m0 <- bru(deaths ~ 1, 
+     data = dat, family = "poisson", 
+     options = list(offset = log(dat$pop)))
> summary(m0)
inlabru version: 2.3.1.9000
INLA version: 21.11.01
Components:
  Intercept: Model types main='linear', group='exchangeable', replicate='iid'
Likelihoods:
  Family: 'poisson'
    Data class: 'data.table', 'data.frame'
    Predictor: deaths ~ .
Time used:
    Pre = 1.03, Running = 0.358, Post = 0.0459, Total = 1.43
Fixed effects:
           mean    sd 0.025quant 0.5quant 0.975quant  mode kld
Intercept 2.125 0.122      1.877    2.128      2.357 2.134   0

Deviance Information Criterion (DIC) ...............: 80.97
Deviance Information Criterion (DIC, saturated) ....: 109.64
Effective number of parameters .....................: 0.966

Watanabe-Akaike information criterion (WAIC) ...: 90.14
Effective number of parameters .................: 8.62

Marginal log-Likelihood:  -45.08
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

# without  offset
> m1 <- bru(deaths ~ 1, 
+     data = dat, family = "poisson")
> summary(m1)
inlabru version: 2.3.1.9000
INLA version: 21.11.01
Components:
  Intercept: Model types main='linear', group='exchangeable', replicate='iid'
Likelihoods:
  Family: 'poisson'
    Data class: 'data.table', 'data.frame'
    Predictor: deaths ~ .
Time used:
    Pre = 0.576, Running = 0.226, Post = 0.0269, Total = 0.83
Fixed effects:
           mean    sd 0.025quant 0.5quant 0.975quant  mode kld
Intercept 2.125 0.122      1.877    2.128      2.357 2.134   0

Deviance Information Criterion (DIC) ...............: 80.97
Deviance Information Criterion (DIC, saturated) ....: 109.64
Effective number of parameters .....................: 0.966

Watanabe-Akaike information criterion (WAIC) ...: 90.14
Effective number of parameters .................: 8.62

Marginal log-Likelihood:  -45.08
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
ASeatonSpatial commented 2 years ago

See ?bru_options I think the issue is that any options that do not begin with bru_ are passed to inla(options = (...)) but that is not where the offset call should go. So your model m2 (EDIT: I meant m0) would be something like

m2 = inla(deaths ~ 1,
          options = list(offset = log(dat$pop)),
          data = dat,
          family = "poisson")

in plain INLA which returns an error option offset not used. I'm not sure why the bru() call doesn't give an error. Maybe I'm getting the logic of the code wrong. I am sure @finnlindgren will have a view on this.

I think what you want is to end up with something like the following inla call:

m3 = inla(deaths ~ 1,
          offset = log(dat$pop),
          data = dat,
          family = "poisson")

which gives

> summary(m3)                                                                          

Call:                                                                                  
   "inla(formula = deaths ~ 1, family = \"poisson\", data = dat, offset =              
   log(dat$pop))"                                                                      
Time used:                                                                             
    Pre = 0.207, Running = 0.0501, Post = 0.0126, Total = 0.27                         
Fixed effects:                                                                         
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld                     
(Intercept) -4.635 0.122     -4.883   -4.632     -4.403 -4.626   0                     

Expected number of effective parameters(stdev): 1.00(0.00)                             
Number of equivalent replicates : 8.00                                                 

Marginal log-Likelihood:  -20.34                    

I tried the following bru() call which didn't work:

m5 <- bru(deaths ~ 1 + offset(log(dat$pop)), 
          data = dat, 
          family = "poisson")
> m1 <- bru(deaths ~ 1 + offset(log(dat$pop)),                                         
         +           data = dat,                                                       
+           family = "poisson")                                                        
Error in `[.data.frame`(x, by) : undefined columns selected     

But this seems to do the trick:

m5 <- bru(formula = deaths ~ Intercept + offset(log(dat$pop)), 
          components = ~ Intercept(1),
          data = dat, 
          family = "poisson")
> summary(m5)
inlabru version: 2.3.1.9000
INLA version: 20.12.10
Components:
  Intercept: Model types main='linear', group='exchangeable', replicate='iid'
Likelihoods:
  Family: 'poisson'
    Data class: 'data.frame'
    Predictor: deaths ~ Intercept + offset(log(dat$pop))
Time used:
    Pre = 0.186, Running = 0.0545, Post = 0.0178, Total = 0.258 
Fixed effects:
            mean    sd 0.025quant 0.5quant 0.975quant   mode kld
Intercept -4.635 0.122     -4.883   -4.632     -4.403 -4.626   0

Expected number of effective parameters(stdev): 1.00(0.00)
Number of equivalent replicates : 8.00 

Deviance Information Criterion (DIC) ...............: 40.30
Deviance Information Criterion (DIC, saturated) ....: 10.99
Effective number of parameters .....................: 0.997

Watanabe-Akaike information criterion (WAIC) ...: 41.33
Effective number of parameters .................: 1.70

Marginal log-Likelihood:  -24.72 
Posterior marginals for the linear predictor and
 the fitted values are computed
finnlindgren commented 2 years ago

The intended way to include offsets in inlabru models is either to add them directly to the predictor, or to define offset components. However, there's seems to be a bug preventing that approach. There is however an easy workaround, as an offset is just a constant term added to the predictor expression, and inlabru allows arbitrary R expressions in the "formula", separate from the components definition. See below for what should have worked, and for what does work:

library(inlabru)

dat <- data.frame(
  id = c(1:8),
  deaths = c(5, 30, 2, 4, 5, 4, 7, 10),
  pop = c(1000, 2300, 300, 400, 500, 700, 1000, 700)
)

# Faulty versions that should have worked, and now work in new version:
m1 <- bru(deaths ~ 1 + myoffset(log(pop), model = "offset"),
         data = dat,
         family = "poisson")
m2 <- bru(~ 1 + myoffset(log(pop), model = "offset"),
         formula = deaths ~ Intercept + myoffset,
         data = dat,
         family = "poisson")

# OK versions:
m3 <- bru(components = ~ 1,
          formula = deaths ~ Intercept + log(pop),
          data = dat,
          family = "poisson")
m4 <- bru(components = ~ 1,
          formula = deaths ~ Intercept + log(.data.$pop),
          data = dat,
          family = "poisson")
# To verify that m3 and m4 aren't ignoring "pop":
m0 <- bru(deaths ~ 1,
          data = dat,
          family = "poisson")
finnlindgren commented 2 years ago

Bug source found; it's a regression bug introduced when support for inla models with hidden latent elements were added (such as "bym"). The test cases didn't catch it, and I've now also found some other inconsistencies in how offsets are handled. (specifically, my initial code fix made m2 in my example work, but not m1, even though the m1 definition is supposed to become exactly m2 insternally.)

finnlindgren commented 2 years ago

Note: I've updated the examples above to use log(pop) instead of pop. (The incorrect non-logged version breaks with inla.mode="experimental" due to numerical problems.)