CAR model development - Githubissues

ericward-noaa commented 2 years ago

Initial development in CAR branch:

To do list:

spatial and spatiotemporal components currently included, with the latter as IID. Does it make sense to also include AR(1)?
matrix.inverse() currently used to invert Q. Works quick with 50 regions -- but probably smarter way to do that if regions gets big
Need to include functionality for prediction, etc
Current test in 'test-car' shows difficulty in recovering sim parameters - need to double check cpp model
Currently assumes user passes in a single matrix containing number of neighbors (D) and adjacency (W) -- better to split into list?
Priors?
Sparse CAR model (M. Joseph Stan example) -- data differs in that eigenvectors are passed in as data
The 'CAR_neighbours' argument could also be constructed internally, but that'd require dependences on a few spatial packages that already do those calculations of W, etc

seananderson commented 2 years ago

Very cool! I look forward to experimenting with this and trying to figure out what's going on soon.

seananderson commented 2 years ago

I think there's an efficient TMB CAR model in here with a sparse matrix: https://github.com/mcruf/LGNB/blob/master/src/LGNB.cpp

I think this part: https://github.com/mcruf/LGNB/blob/89d571731451e4ae05d94ae65f35bddf01630f11/src/LGNB.cpp#L118-L120

The spatial covariance matrix Σ S , in contrast, was expressed according to Kristensen et al. (2014), which requires the discretization of the study area into a regular grid (hereby on a 5 × 5 km resolution; see Appendix S1: Fig. S1). The precision matrix Q (inverse of the covariance matrix) in this case is modeled via a first-order conditional autoregressive (CAR) process, which is neither stationary nor isotropic, but accounts for the complex geometry of the spatial area and is computationally more efficient due to the sparseness of the matrix.

That's from https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/eap.2453

With the covariance matrix described on page 329 here: https://cdnsciencepub.com/doi/pdf/10.1139/cjfas-2013-0151

Not sure if that's the same model as the one in the M. Joseph Stan example, but maybe it's helpful.

ericward-noaa commented 2 years ago

Thanks, I implemented the sparse matrix approach -- which makes more sense. brms() takes a slightly different approach, passing in eigenvectors as data -- but that's just because the sparse matrix methods aren't as far in Stan.

With the GMRF approach, I left the math the same as the traditional CAR model. Here Q is the precision matrix (Sigma = Q^{-1}), D is a diagonal matrix of neighbors (rowSums(W)), W is the adjacency matrix and tau and alpha are estimated. $Q = tau (D - alpha W)$

If tau = 1/sigma^2, then Q^{-1} is sigma^2 (D - alpha W) ^ {-1}, so maybe there's a more efficient way to model this on the TMB side.

Parameter issues Across a lot of simulated data, tau and alpha seem like they're currently estimated ok, but not perfectly. One issue could be the lack of prior (identifiability is a potential issue between them). Second, I think there's supposed to be a constraint on the spatial random effects summing to 0. I played with a hack-y approach for implementing this kind of constraint, but am thinking about it more: https://github.com/pbs-assess/sdmTMB/blob/0b9264d059d91e2f55768716313906819b2b797c/src/sdmTMB.cpp#L274

Comparisons to Kasper's approach: Their approach uses 'Q0', which is the same as the negative version of the 'CAR_neighbors' matrix I created (W as above, with D as the diagonal) https://github.com/mcruf/LGNB/blob/89d571731451e4ae05d94ae65f35bddf01630f11/R/LGNB_Rmodel.R#L475 The Q matrix that is constructed is slightly different I think from the conventional model, https://github.com/mcruf/LGNB/blob/89d571731451e4ae05d94ae65f35bddf01630f11/src/LGNB.cpp#L118

seananderson commented 2 years ago

Quickly: I believe this is a sum-to-zero constraint in TMB: https://github.com/mintoc/pnas_efm_paper/blob/ca66909e99d9552a4bad7116acdc3ca286e85963/dlm_ar1.cpp#L28-L30 although it beats me why this syntax works.

seananderson commented 2 years ago

I tried simulation testing over 50 iterations here: https://github.com/pbs-assess/sdmTMB/blob/CAR/inst/sim-test-car.R

The estimates look plausibly unbiased, although phi collapses to near zero in many cases even when set to 0.2.

ericward-noaa commented 2 years ago

Awesome - thanks for doing that. I'll simulation test it a bit more.

The sum to zero constraint is mostly an issue for the ICAR models (alpha = 1). The TMB example with the constraint above crashes, so looked for other other options. Aaron Osgood-Zimmerman's paper includes a discrete spatial simulation / estimation model (see Appendix C, p 60 https://arxiv.org/pdf/2103.09929.pdf). The code for the BYM example in his repo here: https://github.com/aaron-oz/tmb-inla-paper/tree/master/discrete_sim

ericward-noaa commented 2 years ago

I should have realized this -- but I think the issue before was the magnitude of spatial variance in the simulated data was a bit too small. Tried some other values and seems to recover the parameters pretty well (alpha also doesn't go to 0)

ericward-noaa commented 2 years ago

Just a note on where this is at: CAR model seems to be working as expected, and I've added optional beta priors on alphas. Also added better documentation for the model.

The last thing I need to fix is the spatiotemporal model, which is basically set up as IID CAR random effects. This line seems to be causing problems, and need to dig into why: https://github.com/pbs-assess/sdmTMB/blob/3df746ebbeb6c6d0240c1739587b639cf5ea5a49/src/sdmTMB.cpp#L284

seananderson commented 2 years ago

I stumbled on this https://methods-x.com/article/S2215-0161(22)00251-5/fulltext

pbs-assess / sdmTMB

CAR model development #68