add conditional_bernoulli distribution

goldingn commented 6 years ago

Add a distribution node type for this compound distribution, where the observed variable y can only take value 1 if a latent bernoulli variable z takes value 1, else it must take 0, i.e.:

y_i ~ bernoulli(z p_i) z ~ bernoulli(\psi)*

p_i = p(y_i = 1 | z = 1) \psi = p(z = 1)

Inference on p and psi is tractable when there are multiple bernoulli trials in each observation (where p take different values in each trial). The density of this compound distribution (i.e. the likelihood for y) can be calculated directly, explicitly integrating over the latent variable z.

This formulation underpins the ecological imperfect-detection model of MacKenzie et al. which gives the formulation of the likelihood/density.

For each observation in the imperfect detection model, y and p are vectors giving indicating whether a species was detected at each visit, and the probability of detection (which may vary between visits), and z and psi are scalars indicating whether the species was present (assumed the same at all visits) and the probability of being present.

The syntax would be:

distribution(y) <- conditional_bernoulli(p, psi)

(since the distribution is discrete, y can't be a variable), where y and p are vectors (or matrices) and psi is a scalar (or vector).

goldingn commented 6 years ago

there are extensions to this model, for example introducing a parameter q_i = p(y_i = 1 | z_i = 0), corresponding to erroneous detections in the imperfect detection case. These could be added via additional, optional parameters

jdyen commented 5 years ago

could this be setup to take more flexible inputs, e.g., for a model with constant p or constant psi across multiple individuals? z_i = bernoulli(\psi) or z_i = bernoulli(\psi_i) with y_ij = bernoulli(\psi * p) or y_ij = bernoulli(\psi * p_i) or y_ij = bernoulli(\psi * p_j)

So \psi could have dims (1 x 1) or (n x 1), and p could have dims (1 x 1), (n x 1), (1 x k), or (n x k), assuming y has dims (n x k).

My current way of handling this is:

# with a common psi for all individuals
psi <- beta(1, 1, dim = 1)
psi_expanded <- do.call('c', lapply(1:n, function(x) psi))

and

# with a single p for each individual
p <- beta(1, 1, dim = c(n, 1))
p_expanded <- do.call('cbind', lapply(1:k, function(x) p))

hrlai commented 3 years ago

Dear @goldingn and @jdyen , thank you pioneering these models in greta. I am exploring how to incorporate imperfect detection into abundance joint species distribution modelling after reading Tobler et al. (2019), which then led me to Yamaura et al. (2012).

However, I ran into an error while trying to implement Yamaura et al. (2012)'s model in greta:

Error: model contains a discrete random variable that doesn't have a fixed value, so cannot be sampled from

I think it is related to this thread, because I cannot use poisson for unobserved variable? This is also hinted in the helpfile. I hope to get some thoughts on whether I'm coding it incorrectly or is it something not yet implemented in greta?

Below is two code chunks, one to simulate abundance data and another to fit model (apologies if they look ugly!) :

Simulate data

# Simulate data to test fit_model.R script

# Following the data generative process in 
# Yamaura et al. (2012) Biodiversity and Conservation

set.seed(123)

# number of species
n_spp  <- 10   
# number of sites  
n_site <- 100 
# number of sampling events
t <- 10

# probability of detection for each species
# assuming the same p_j across sites for now
alpha0_true <- rnorm(n_spp, 2, 1)
mu_true     <- alpha0_true
sigma_true  <- rexp(n_spp, 2)
p_true      <- numeric(n_spp)
for (j in seq_len(n_spp)) {
    logit_p_true <- rnorm(1, mu_true, sigma_true)
    p_true[j] <- exp(logit_p_true) / (exp(logit_p_true) + 1)
}

# true abundance
# using Poisson for now (explore neg. binom. later)
beta0_true  <- rnorm(n_spp, 2, 1)
lambda_true <- exp(rep(1, n_site) %*% t(beta0_true))
N_true <- matrix(rpois(n_site * n_spp, lambda_true), n_site, n_spp)

# observed 
n_true <- array(NA, dim = c(n_site, n_spp, t))
for (i in seq_len(n_site)) {
    for (j in seq_len(n_spp)) {
        n_true[i, j, ] <- rbinom(t, N_true[i, j], p_true[j])
    }
}

Fit model

library(greta)

# source("code/sim_data.R")

# data
n <- as_data(n_true)

# variables
X <- ones(n_site)
V <- ones(n_site)

# priors
sd_beta0 <- exponential(2)
beta0    <- normal(0, sd_beta0, dim = n_spp)
lambda   <- exp(X %*% t(beta0))
N        <- poisson(lambda)    # I think this doesn't work as per helpfile
N <- do.call(abind, c(replicate(t, N, simplify = FALSE), along = 3))

sd_alpha0   <- exponential(2)
alpha0      <- normal(0, sd_alpha0, dim = n_spp)
mu          <- V %*% t(alpha0)
sigma       <- exponential(2, dim = n_spp)
sigma <- do.call(rbind, replicate(n_site, t(sigma), simplify = FALSE))
# p           <- ilogit(multivariate_normal(mu, sigma))
p           <- ilogit(normal(mu, sigma))
p <- do.call(abind, c(replicate(t, p, simplify = FALSE), along = 3))

# observational model
distribution(n) <- binomial(N, p)
mod <- model(N, p)

jdyen commented 3 years ago

I don't think this has been added to greta. A post on the greta forum might get some good answers (maybe do a quick search first, someone might have already posted a workaround).

hrlai commented 3 years ago

Thanks for the heads up. Sure I'll move over to the forum and apologies if here isn't the best place to ask :)

greta-dev / greta.multivariate

add conditional_bernoulli distribution #2

Simulate data

Fit model