inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
76 stars 21 forks source link

lgcp inference under preferential sampling #57

Closed stanbiryukov closed 5 years ago

stanbiryukov commented 5 years ago

Hi there- has anyone successfully applied the lgcp function under a preferential sampling assumption such that the sampling locations are dependent on the underlying Gaussian random field? Perhaps passing two likelihoods into bru? I've worked through the INLA example (http://www.r-inla.org/examples/case-studies/diggle09 ) and am wondering if there are any more convenient methods or examples floating around utilizing lgcp/bru syntax.

fbachl commented 5 years ago

Hi Stan, just recently I had a discussion on this with Maria Pennino, first author of a recently published paper about preferential sampling. The paper is available at: https://onlinelibrary.wiley.com/doi/abs/10.1002/ece3.4789 Her data are now available in the devel and backend-clean branches using data(shrimp). I've also written a small example on how to implement preferential sampling using two likelihoods (note that the backend-clean branch is required): `

Set things up

Due to a bug preferential sampling currently only works with the backend-clean branch of inlabru and not using the CRAN version. You can obtain it using

devtools::install_github("fbachl/inlabru", ref="backend-clean", dependencies = TRUE)

Once you installed inlabru, run

library(INLA)
library(inlabru)

Set default INLA values for tutorial session (faster inference but only empirical Bayes):

init.tutorial()

Load the data

Define the spatial domain using a mesh:

data(shrimp)
ggplot() + gg(shrimp$mesh) + gg(shrimp$hauls) + coord_equal()

Model setup

Integration points for LGCP

ips = ipoints(shrimp$mesh)

Matern model for SPDE covariance

matern <- inla.spde2.pcmatern(shrimp$mesh, 
                              prior.sigma = c(0.1, 0.01), 
                              prior.range = c(1, 0.01))

Latent components

cmp = ~ spde(map = coordinates, model = matern) + spdeCopy(map = coordinates, copy = "spde", model = matern, fixed = FALSE) + lgcpIntercept + Intercept

Likelihoods

lik1 =  like(components = cmp,
             data =  shrimp$hauls, 
             family = "poisson", 
             formula = catch ~ spde + Intercept)

lik2 =  like(components = cmp,
             data =  shrimp$hauls, 
             family = "cp",
             ips = ips,
             formula = coordinates ~ spdeCopy + lgcpIntercept)

Fit the model

fit = bru(components = cmp, lik1, lik2, options = list(max.iter = 1))

Results

pxl = pixels(shrimp$mesh)
fish.intensity = predict(fit, pxl, ~ exp(spde + Intercept))
ggplot() + gg(fish.intensity) + gg(shrimp$hauls, size = 0.5)

`

stanbiryukov commented 5 years ago

Hi Fabian, this is excellent thank you! To get the posterior abundance counts, we still apply the weights from ipoints, correct? Thanks for all your work on this package - I really find it useful.

fbachl commented 5 years ago

Hi Stan, I'm always happy when I can make a researcher's life easier :) Indeed, for abundance estimates you have to define an integration domain and create integration points using ipoints. Note that the code above is just a template for running this kind of model. Your abundance estimate will depend heavily on the integration point you pass on to bru because they imply that you assume the area was observed completely. Does that make sense to you?

stanbiryukov commented 5 years ago

That makes sense, you mean completely observed as in the domain of the study, right? Is there a place in the likelihood we would need to add the 'samplers' argument from lgcp or would I just need to make sure the ipoints are constrained to the observation field?

Ran through the code using the backend clean branch and got the following error at this step: lik2 = like(components = cmp, data = shrimp$hauls, family = "cp", ips = ips, formula = coordinates ~ spdeCopy + lgcpIntercept)

"Error in inla.set.hyper(model, section, hyper = hyper.default, debug = debug) : Unknown keyword in hyper' theta2 '. Must be one of theta theta1 b beta . In addition: Warning message: In INLA::f(xxx, ..., group = group, model = model) : Ignored argument model=spde2' in f() due to copy=spde'"

Lastly, to properly add a covariate, does that just go into the components syntax or is it passed to both likelihood formulas too?

stanbiryukov commented 5 years ago

I reinstalled/updated inla using the command from your dockerfile and the example is now working.

fbachl commented 5 years ago

Great! Maybe you had an outdated version of INLA?

Yes, by completely observed I mean the area which goes into the ips argument. That's usually the study area and the boundaries are often sort of arbitrary but they matter quite a bit. The samplers argument for lgcp is just another (more convenient way) of defining the area where the point process was observed. lgcp takes the samplers data and runs ipoints() on them and uses the output as ips.

stanbiryukov commented 5 years ago

Thanks Fabian - realize this is still a development branch but here's an error I encountered with coordinate names/projections when trying to add a covariate. Thoughts on what the correct approach should be? Obviously human pop doesn't make sense here as a covariate but was just trying out the syntax to get things working:

library(inlabru)
library(raster)
library(sp)
init.tutorial()
data(shrimp)
class(shrimp$hauls)
colnames(shrimp$hauls@coords) <- c("x", "y")
ggplot() + gg(shrimp$mesh) + gg(shrimp$hauls) + coord_equal()
colnames(coordinates(shrimp$hauls))
pop <- raster('gpw_v4_population_density_rev10_2010_2pt5_min.tif') # pop density: http://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev10/data-download
pop[is.na(pop[])] <- 0
pop = projectRaster(pop, crs = crs(shrimp$hauls))
popcov <- as(pop, "SpatialPixelsDataFrame")
colnames(popcov@coords)
gridded(popcov) <- TRUE
colnames(coordinates(popcov))

f.pop <- function(x,y) {
  spp <- SpatialPoints(data.frame(x=x,y=y))
  proj4string(spp) <- CRS(proj4string(popcov))
  v <- over(spp, elevcov)
  v[is.na(v)] <- 0
  return (v$layer)
}

# integration points
ips = ipoints(shrimp$mesh)
colnames(coordinates(ips))

matern <- inla.spde2.pcmatern(shrimp$mesh, prior.sigma = c(0.1, 0.01), prior.range = c(1, 0.01))

cmp1 = ~ spde(map = coordinates, model = matern) + spdeCopy(map = coordinates, copy = "spde", model = matern, fixed = FALSE) + lgcpIntercept + Intercept + beta.pop(map = f.pop(x, y), model = "linear")
cmp2 = ~ spde(map = coordinates, model = matern) + spdeCopy(map = coordinates, copy = "spde", model = matern, fixed = FALSE) + lgcpIntercept + Intercept + beta.pop(map = popcov, model = "linear")

lik1 =  inlabru::like(components = cmp2, data =  shrimp$hauls, family = "poisson", formula = catch ~ spde + Intercept)
lik2 =  inlabru::like(components = cmp2, data =  shrimp$hauls, family = "cp", ips = ips, formula = coordinates ~ spdeCopy + lgcpIntercept)
fit = bru(components = cmp2, lik1, lik2, options = list(max.iter = 1))

Utilizing the first component formula where the covariate is returned via a function I get the error:

Error in match.names(clabs, names(xi)) : names do not match previous names

And if I use the reprojected spatialpixelsdataframe directly as a covariate I see:

Error in over(points, emap) : identicalCRS(x, y) is not TRUE

I encountered this issue with lgcp last week and my workaround was to make sure all coordinates were named x and y everywhere but that doesn't seem to be working here unless I'm missing something?

finnlindgren commented 5 years ago

Generally speaking, it's safest if all inputs either use the same CRS, including the mesh. In particular, if one has a CRS the rest should also have one. Looks like the res an untangled mismatch somewhere.

On Tue, 22 Jan 2019, 18:52 stanbiryukov <notifications@github.com wrote:

Thanks Fabian - realize this is still a development branch but here's an error I encountered with coordinate names/projections when trying to add a covariate. Thoughts on what the correct approach should be? Obviously human pop doesn't make sense here as a covariate but was just trying out the syntax to get things working:

library(inlabru) library(raster) library(sp) init.tutorial() data(shrimp) class(shrimp$hauls) colnames(shrimp$hauls@coords) <- c("x", "y") ggplot() + gg(shrimp$mesh) + gg(shrimp$hauls) + coord_equal() colnames(coordinates(shrimp$hauls)) pop <- raster('gpw_v4_population_density_rev10_2010_2pt5_min.tif') # pop density: http://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev10/data-download pop[is.na(pop[])] <- 0 pop = projectRaster(pop, crs = crs(shrimp$hauls)) popcov <- as(pop, "SpatialPixelsDataFrame") colnames(popcov@coords) gridded(popcov) <- TRUE colnames(coordinates(popcov))

f.pop <- function(x,y) { spp <- SpatialPoints(data.frame(x=x,y=y)) proj4string(spp) <- CRS(proj4string(popcov)) v <- over(spp, elevcov) v[is.na(v)] <- 0 return (v$layer) }

integration points

ips = ipoints(shrimp$mesh) colnames(coordinates(ips))

matern <- inla.spde2.pcmatern(shrimp$mesh, prior.sigma = c(0.1, 0.01), prior.range = c(1, 0.01))

cmp1 = ~ spde(map = coordinates, model = matern) + spdeCopy(map = coordinates, copy = "spde", model = matern, fixed = FALSE) + lgcpIntercept + Intercept + beta.pop(map = f.pop(x, y), model = "linear") cmp2 = ~ spde(map = coordinates, model = matern) + spdeCopy(map = coordinates, copy = "spde", model = matern, fixed = FALSE) + lgcpIntercept + Intercept + beta.pop(map = popcov, model = "linear")

lik1 = inlabru::like(components = cmp2, data = shrimp$hauls, family = "poisson", formula = catch ~ spde + Intercept) lik2 = inlabru::like(components = cmp2, data = shrimp$hauls, family = "cp", ips = ips, formula = coordinates ~ spdeCopy + lgcpIntercept) fit = bru(components = cmp2, lik1, lik2, options = list(max.iter = 1))

Utilizing the first component formula where the covariate is returned via a function I get the error:

Error in match.names(clabs, names(xi)) : names do not match previous names

And if I use the reprojected spatialpixelsdataframe directly as a covariate I see:

Error in over(points, emap) : identicalCRS(x, y) is not TRUE

I encountered this issue with lgcp last week and my workaround was to make sure all coordinates were named x and y everywhere but that doesn't seem to be working here unless I'm missing something?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fbachl/inlabru/issues/57#issuecomment-456518396, or mute the thread https://github.com/notifications/unsubscribe-auth/AG1Ls-ISSDs-W5KTqNuOVOR8ZW0nMxFJks5vF14IgaJpZM4aM-S1 .

stanbiryukov commented 5 years ago

Good point on the CRS of the mesh - that was the issue. The fit progresses after I added: shrimp$mesh$crs = inla.CRS(crs(shrimp$hauls))

stanbiryukov commented 5 years ago

Thanks again Finn and Fabian - this approach made the modeling process much more tractable.

finnlindgren commented 2 years ago

Updated example using updated syntax. TODO: include the further details from https://github.com/inlabru-org/inlabru/issues/57#issuecomment-456518396

library(INLA)
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loading required package: parallel
#> Loading required package: sp
#> This is INLA_99.99.9999 built 2022-01-17 21:29:35 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.
#>  - To enable PARDISO sparse library; see inla.pardiso()
library(inlabru)
library(ggplot2)

data(shrimp)

matern <- inla.spde2.pcmatern(shrimp$mesh, alpha = 2,
                              prior.sigma = c(0.1, 0.01),
                              prior.range = c(1, 0.01))

cmp = ~ spde(coordinates, model = matern) +
  spdeCopy(coordinates, copy = "spde", fixed = FALSE) +
  lgcpIntercept(1) +
  Intercept(1)

# The catch data has some non-integer value, so Poisson is inappropriate
# Using the pseudo-model "xpoisson" that allows non-integer observations instead
lik1 <- like(data =  shrimp$hauls,
             family = "xpoisson",
             formula = catch ~ spde + Intercept)

lik2 <- like(data =  shrimp$hauls,
             family = "cp",
             domain = list(coordinates = shrimp$mesh),
             formula = coordinates ~ spdeCopy + lgcpIntercept)

# Even though both likelihoods are log-linear, INLA needs some
# assistance in finding the modes; bru_max_iter = 3 seems sufficient here, but
# allowing more iterations is safer; this iteration stops at 4.
fit <- bru(components = cmp,
           lik1,
           lik2,
           options = list(bru_max_iter = 20,
                          bru_verbose = 4))
#> iinla: Iteration 1 [max:20]
#> iinla: |lin1-lin0| = 513
#>        <eta-lin1,delta>/|delta| = -1.282e-15
#>        |eta-lin0 - delta <delta,eta-lin0>/<delta,delta>| = 1.063e-14
#> iinla: Iteration 2 [max:20]
#> iinla: Step rescaling: 162%, Expand
#> iinla: Step rescaling: 100%, Overstep
#> iinla: Step rescaling: 100%, Optimisation
#> iinla: |lin1-lin0| = 0.1291
#>        <eta-lin1,delta>/|delta| = 1.217e-13
#>        |eta-lin0 - delta <delta,eta-lin0>/<delta,delta>| = 5.04e-11
#> iinla: Max deviation from previous: 6.06% of SD [stop if: <1%]
#> iinla: Iteration 3 [max:20]
#> iinla: Step rescaling: 100%, Optimisation
#> iinla: |lin1-lin0| = 0.02105
#>        <eta-lin1,delta>/|delta| = -2.479e-15
#>        |eta-lin0 - delta <delta,eta-lin0>/<delta,delta>| = 4.775e-12
#> iinla: Max deviation from previous: 0.953% of SD [stop if: <1%]
#> iinla: Convergence criterion met, running final INLA integration with known theta mode.
#> iinla: Iteration 4 [max:20]

pxl <- pixels(shrimp$mesh)
fish.intensity <- predict(fit, pxl, ~ exp(spde + Intercept))
ggplot() +
  gg(fish.intensity) +
  gg(shrimp$hauls, size = 0.5)

Created on 2022-01-23 by the reprex package (v2.0.1)