greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
527 stars 63 forks source link

Spatial random effect model issue #218

Closed BertvanderVeen closed 6 years ago

BertvanderVeen commented 6 years ago

I've been playing around with Greta, trying to fit models with a spatial random effect. I've done this by forming a latent variable following a multivariate normal with a distance structured covariance matrix.

Generally this works fine, but when i have a repeated samples, I can't get the MCMC to start. It won't find starting values by its own, nor will it start when I supply I initial values. Any tips or tricks on this front?

In addition, (in the case of a multivariate normal) when specifying the mean vector as a Greta array, (I think) I get a column vector. When I re-specify the array with new dimensions, and it's turned into an operation node, and I tell greta it's actually a data node, I manage to get things to work. Am I doing something wrong or is this expected behavior atm?

I must say, I'm starting to really like greta!

(will post some code at a later moment for future readers)

goldingn commented 6 years ago

Great, I'm glad you're enjoying it!

Please do post the troublesome code when you get a chance, I won't be able to help much otherwise.

Here's an example of a linear regression model with spatially-correlated residuals, which I think is what you are after:

# make fake spatial data
set.seed(2018-08-23)

n <- 100
k <- 3
coords <- matrix(runif(n * 2), ncol = 2)
X <- matrix(rnorm(n * k), ncol = k)

true_intercept <- rnorm(1, 0, 1)
true_beta <- rnorm(k, 0, 0.2)
true_range <- 0.8
true_mu <- true_intercept + X %*% true_beta
true_Sigma <- exp(-0.5 * as.matrix(dist(coords)) / true_range)

y <- MASS::mvrnorm(1, true_mu, true_Sigma)

# plot(coords, asp = 1, cex = exp(scale(y)))

# using the dev branch
# devtools::install_github("greta-dev/greta@dev")
library(greta)
#> 
#> Attaching package: 'greta'
#> The following objects are masked from 'package:stats':
#> 
#>     binomial, poisson
#> The following objects are masked from 'package:base':
#> 
#>     %*%, backsolve, beta, colMeans, colSums, diag, eigen,
#>     forwardsolve, gamma, rowMeans, rowSums, sweep, tapply

# parameters
range <- normal(0, 0.5, truncation = c(0, Inf))
beta <- normal(0, 10, k)
intercept <- normal(0, 3)

# spatial random effect
distance <- as.matrix(dist(coords))
Sigma <- exp (-0.5 * distance / range)

# mean
mu <- intercept + X %*% beta

# this is one realisation of a multivariate normal, so it needs to be a row
y_row <- t(y)
distribution(y_row) <- multivariate_normal(t(mu), Sigma)

m <- model(intercept, beta, range)

library(future)
plan(multisession)
draws <- mcmc(m,
              chains = 2,
              warmup = 2000,
              one_by_one = TRUE)  # handle cholesky errors
#> 
#> running 2 chains in parallel, each on up to 2 cores
#>

plot(draws)


# looks good, but we could do with some more samples for 'range'
coda::effectiveSize(draws)
#> intercept beta[1,1] beta[2,1] beta[3,1]     range 
#> 1822.9392 1203.1398 1206.9114 1253.6127  826.3138
draws <- extra_samples(draws,
                       n_samples = 1000,
                       one_by_one = TRUE)
#> 
#> running 2 chains in parallel, each on up to 2 cores
#> 
plot(draws)

coda::effectiveSize(draws)
#> intercept beta[1,1] beta[2,1] beta[3,1]     range 
#>  3688.805  2331.398  1933.940  2278.731  1701.291

# check posterior estimates
summary(draws)$statistics
#>                 Mean         SD     Naive SE Time-series SE
#> intercept -0.8342330 0.77911704 0.0123189221   0.0128257061
#> beta[1,1] -0.5279881 0.02053475 0.0003246829   0.0004258921
#> beta[2,1] -0.3297013 0.01533758 0.0002425084   0.0003500329
#> beta[3,1]  0.1286363 0.01677290 0.0002652028   0.0003817559
#> range      0.8705338 0.12348462 0.0019524633   0.0029941229
c(true_intercept, true_beta, true_range)
#> [1] -0.4742740 -0.5280850 -0.3082669  0.1422844  0.8000000

# plot correlated residuals (spatial correlation component)
resid <- y - mu
resid_draws <- calculate(resid, draws)
resid_mean <- summary(resid_draws)$statistics[, "Mean"]
plot(coords, asp = 1, cex = exp(scale(resid_mean)))

Created on 2018-08-23 by the reprex package (v0.2.0).

One reason you might be having trouble is if your coordinates are large (positive or negative) numbers. That can lead to numerical instability meaning TensorFlow can take the Cholesky factor of your matrix. You can rescale them to have mean 0, and standard deviation 1, which should help. See also the one_by_one argument to mcmc() and extra_samples(), which handles numerical errors, like failure to factorise or invert a matrix.

BertvanderVeen commented 6 years ago

Thanks for the detailed post, it definitely resolved some of the unclarities I was having. My apologies for not providing code earlier, I'm afraid I ran out of time while writing the post.

I decided to go for a model with an autoregressive term, although still including a spatially structured variance-covariance matrix.


library(greta)
set.seed(2018-08-24)

n_samples <- 30
n_locations <- 200

#fake coordinate
coords<-cbind(runif(n_locations), runif(n_locations))

#fake explanatories
X<-rnorm(n_locations)

true_intercept <- rnorm(1, 0, 1)
true_beta <- rnorm(1,0,2)
true_sig<-2
true_phi<-5

true_mu <- true_intercept + X * true_beta + rnorm(n_locations, 0, 2)

library(fields)

#distance matrix H<-rdist(coords) true_Sigma <- true_sig*exp(-true_phi * H) #fake response y <- MASS::mvrnorm(n_samples, true_mu,true_Sigma) ##the model## #global parameters beta1<-normal(0,10) intercept<-normal(0,10) #parameters for the correlation function sig<-normal(0,10) phi<-normal(0,10) #spatial co-var matrix Sigma<-sig*exp(-phi*H) #AR coefficient omega<-normal(0,10) #initiate mu mu<-as_data(matrix(0, nrow=n_samples, ncol=n_locations))

So, at this point I want to iterate over the samples. The code below works fine for a small number of samples. However, at a large number of samples the loop below hangs and I don't get any further. From some of the other I understand that this is something you're working on, but do I have any other options at the moment to make this work?


#Keep X stable over time
X1<-matrix(rep(X, each=n_samples), ncol=n_locations)

#Initiate a progress bar for the following loop
pb <- txtProgressBar(min = 1, max = n_samples, style = 3)

#Loop over time
  for(i in 1:n_samples){ 
  if(i ==1){
    setTxtProgressBar(pb, 0)
    mu[i,]<-t(intercept +  X1[i,] * beta1)
    setTxtProgressBar(pb, i)
  }  else{
    mu[i,]<-t(intercept +  omega*t(mu[i-1,])+ X1[i,] * beta1)
    setTxtProgressBar(pb, i)
  }
}

#Likelihood
distribution(y)<- multivariate_normal(mu, Sigma) 

#model
m <- model(intercept, sig, phi, beta1, omega)

#MCMC sampling
library(future)
plan(multisession)
draws <- mcmc(m,
              chains = 3, n_samples=2000,
              warmup = 1000, one_by_one=T)
bayesplot::mcmc_trace(draws)

goldingn commented 6 years ago

Thanks for the interesting model Bert! There are two things going on here.

Firstly, there was a lot of extraneous recursion happening when defining greta arrays. Profiling this code helped me work out how to remove that, so now it doesn't hang at all when recursing, if you use the experimental branch I just pushed for this:

devtools::install_github("greta-dev/greta@remove_recursion")

However, it is still very slow to define the model, but that time is almost all due to Tensorflow trying to define the gradients of the joint density. That's because there are a lot of operations in the graph. Each of the transposes, extractions and replacements (ie. [, i]) is an operation, and tensorflow needs to differentiate through them. That's obviously amplified when there are lots of operations inside a loop. There are a couple of things I can do to alleviate that with the current implementation of greta (#210, using more efficient TF extraction ops when possible), but until I do, the best thing to do is to try to remove the loop from the model definition.

I think you can do that in your case by expanding out the recursion. Where:

eta <- intercept + X * beta1

you have, for each spatial location i:

mu_i2 = eta + omega mu_i1 mu_i3 = eta + omega (eta + omega mu_i1) = eta + omega eta + omega omega mu_i1 mu_in = eta (1 + omega + omega ^ 2 + ... + omega ^ (n - 1) + mu_i1 omega ^ n)

since: mu_i1 = eta_i , mu_in = eta_i (1 + omega + omega ^ 2 + ... + omega ^ (n - 1) + eta_I omega ^ n)

So vectorising that, I think you could define the model more efficiently like this:

omega_powers <- omega ^ seq_len(n_samples)
multipliers <- 1 + cumsum(omega_powers[-n_samples])
eta_powered <- kronecker(eta, t(omega_powers[-1]))
mu_future <- sweep(eta_powered, 2, multipliers, FUN = "+")
mu_future <- sweep(mu_future, 1, eta, FUN = "*")
mu <- cbind(eta, mu_future)
distribution(y) <- multivariate_normal(t(mu), Sigma) 

(though I'm writing this in a hurry, so haven't checked my working!)

That leads to a much smaller graph (with the number of operations independent of the number of samples) and is defined much more quickly. I think probably samples more quickly too.

Doing the maths on that is a bit of a pain, the greta for-loop planned in #210 will make things easier for users.

goldingn commented 6 years ago

The remove recursion branch is now merged into dev and and been deleted, so ou'll need to update dev with: devtools::install_github("greta-dev/greta@dev")

BertvanderVeen commented 6 years ago

Thanks, Nick! This has been very helpful.

goldingn commented 6 years ago

No problem!