I noticed a potential bug when predicting from a "bym" model. See a reprex below which requires the SpatialEpi package that has some areal unit data.

If predicting without supplying new data, the predictions seem sensible. However, if passing a data object to the predict() function, it doesn't seem to return a prediction dataframe with sensible dimensions. I wonder if something is maybe breaking with the indexing since a bym model is a bit different, the effect is of length 2n, where n is number of nodes in the graph. If predicting without a data object it's possible to just subset this as required as I do below.

# Fit BYM model to Scotland lip cancer dataset
# load data
lip_data = scotland$spatial.polygon
lip_data$cases = scotland$data$cases
lip_data$county.names = scotland$geo$county.names
lip_data$ = 1:nrow(lip_data)
lip_data$E = scotland$data$expected   # exposure
lip_data = spTransform(lip_data,
                       CRS("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=km +no_defs"))
# adjacency matrix
nb = poly2nb(lip_data)
nb2INLA("scot.adj", nb)
g = = "scot.adj")
# try BYM model

cmp = cases ~ Intercept(1) + 
          model = "bym",
          graph = g,
          scale.model = TRUE)

fit = bru(components = cmp,
          data = lip_data,
          E = lip_data$E,
          family = "poisson")

# predict
pred = predict(fit, formula = ~ exp(Intercept_latent + bym_eff_latent))
# take first 56 which is u + v (see inla.doc("bym"))
lip_data$mean_pred_bym = pred$mean[1:56] * lip_data$E

vals = c(lip_data$cases, lip_data$mean_pred_bym)
p1 = ggplot() + 
     aes(fill = mean_pred_bym),
     alpha = 1) +
  scale_fill_viridis_c(limits = c(min(vals), max(vals))) + 
  theme_minimal() +
p2 = ggplot() + 
     aes(fill = cases),
     alpha = 1) +
  scale_fill_viridis_c(limits = c(min(vals), max(vals))) + 
  theme_minimal() +
p1 + p2

# try passing data to predict call
pred = predict(fit, 
               data = lip_data,
               formula = ~ exp(Intercept + bym_eff))
# try a data frame
pred_df = data.frame( = 1:10)
pred = predict(fit, 
               data = pred_df,
               formula = ~ exp(Intercept + bym_eff))

The desired behaviour I guess would be for predict(fit, data = pred_obj, ~ some_function(bym_eff)) to return an object with 2 times nrow(pred_obj) rows. This would apply some_function() to both the u + v and v parts of the bym_eff parameters.

finnlindgren commented 2 years ago

For bym_eff_latent, one should get the 2n length vector, as it's explicitly asking for the entire latent vector. For just the plain effect vector, it should give a vector of length n (the second half is hidden). With bym_eff_eval, one can choose to access arbitrary parts of the latent field, by indexing into either the first or second part, or both. The predict call that generates length 2 does seem to be a bug.

finnlindgren commented 2 years ago

The plain bym_eff is meant to act like it does in the estimation, giving a vector of length n, which is the effect "visible" in INLA.

ASeatonSpatial commented 2 years ago

Good to know bym_eff would just return a vector of length n.

So the desired behaviour would be nrow(pred) = 56 in this example?

# try passing data to predict call
pred = predict(fit, 
               data = lip_data,
               formula = ~ exp(Intercept + bym_eff))
#> [1] 56
#> [1] 2

And nrow(pred) = 10 in this one?

# try a data frame
pred_df = data.frame( = 1:10)
pred = predict(fit, 
               data = pred_df,
               formula = ~ exp(Intercept + bym_eff))

#> [1] 10
#> [1] 2

But nrow(pred) = 2 in both.

finnlindgren commented 2 years ago

The problem is that inla_f=TRUE doesn't get set/propagated to ibm_amatrix.bru_mapper_collect when using generate, so it expects list input instead of just the first vector. A workaround is to use _eval with a list in the predict call:

pred = predict(fit,
               data = lip_data,
               formula = ~ exp(Intercept + bym_eff_eval(list(
#> [1] 56

pred_df = data.frame( = 1:10)
pred = predict(fit,
               data = pred_df,
               formula = ~ exp(Intercept + bym_eff_eval(list(

#> [1] 10

I need to think about whether the inla_f logic is such that it should be activated in generate calls; it should probably be renamed if that's the case, or a separate mechanism added to keep the logic predictable.

finnlindgren commented 2 years ago

Use bym_eff_eval(list(v=...)) to access the hidden component separately.

finnlindgren commented 2 years ago

Bug resolved. But I see in the bym and bym2 documentation that the naming of the latent components doesn't match the current internal mapper names, (u,v), as the inla documentation uses (v+u,u). The inlabru mapper names should be changed to match the inla documentation, e.g. (vu,u)?