greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
529 stars 63 forks source link

Dynamic Factor Type Model Won't Compile? #97

Closed jwdink closed 7 years ago

jwdink commented 7 years ago

Hopefully I'm not making some silly mistake here-- assuming that's not the case...

It seems like 'dynamic factor' style models (like this simplified kalman-filter below) won't compile: when I call model, R just hangs.

library('greta')
times <- 1:5
Y <- c(-3.12368290898237, -2.37268902597084, -3.31757075118491, -2.0882385972953, -2.23854863652599)

sigma_measure <- lognormal(0, 1)
sigma_delta <- lognormal(0, 1);
sigma_position <- lognormal(0, 1);

latent_positions <- normal(-2, sigma_position)
latent_delta <- normal(0, sigma_delta)
for (t in unique(times)[-1]) {
  # random position-walk:
  latent_positions <- c(latent_positions, normal(latent_positions[t-1] + latent_delta, sigma_position) )
}
# observed position:
distribution(Y) <- normal(latent_positions, sigma_measure);

compiled_model <- model(latent_positions) # hangs

In contrast, everything seems to go well in stan:

data {
  int nobs; // number of obs
  matrix[nobs,1] Y; //dataset of generated series
  real initial_y;
}
parameters {
  vector[nobs] latent_position;
  real<lower = 0> sigma_position;
  vector[nobs] latent_delta;
  real<lower = 0> sigma_delta;

  real<lower = 0> sigma_measure;
}
transformed parameters {
}
model {
  // priors
  latent_position[1] ~ normal(initial_y, 1);
  latent_delta[1] ~ normal(0, 1);
  Y[1,1] ~ normal(latent_position[1],sigma_measure);

  sigma_position ~ lognormal(0, 1);
  sigma_delta ~ lognormal(0, 1);
  sigma_measure ~ lognormal(0, 1);

  //
  for(t in 2:nobs) {
    latent_position[t] ~ normal(latent_position[t-1] + latent_delta[t-1], sigma_position);
    latent_delta[t] ~ normal(latent_delta[t-1], sigma_delta);
    Y[t,1] ~ normal(latent_position[t], sigma_measure);
  }

}
//
goldingn commented 7 years ago

Hi @jwdink,

Thanks for letting me know!

That's annoying. It runs fine for me though, takes less than a second to compile and converges quite nicely:

screen shot 2017-08-26 at 10 28 34 am

I'm using v0.2.2 (current release at greta-dev/master), though I suspect this is some strange a system issue, rather than a version issue.

Presumably simple code compiles for you? E.g.:

m <- model(normal(0, 1))

What the output of sessionInfo()?

goldingn commented 7 years ago

BTW, this model would be much more efficient (for larger data and in greta) if we could replace the for loop with cumsum(). Only I haven't provided a cumsum method for greta. D'oh!

I'll open that as a separate issue.

goldingn commented 7 years ago

(also, you can fit these random walk models as Gaussian processes, like here, I'll add a Wiener process covariance to gretaGP soon too, to make that easier).

jwdink commented 7 years ago

Not fixed after upgrading to dev-version. :(

Here's my session info (from right before upgrading):

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] greta_0.2.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12      lattice_0.20-35   prettyunits_1.0.2 assertthat_0.2.0  digest_0.6.12     withr_2.0.0       grid_3.4.1       
 [8] R6_2.2.2          jsonlite_1.5      magrittr_1.5      coda_0.19-1       tfruns_0.9        progress_1.1.2    reticulate_1.0   
[15] devtools_1.13.3   tools_3.4.1       parallel_3.4.1    compiler_3.4.1    tensorflow_1.3    memoise_1.1.0    

I very recently installed greta (~1 week ago) and along with it tensorflow, so I doubt it's an issue with an old tensorflow version-- but let me know how I can get the version there if you think that's relevant.

EDIT: And yes, I can get other models to compile.

goldingn commented 7 years ago

hmmmm...

Did the simple example run? Did you restart the R session before re-trying?

You can get the tensorflow version greta is looking at with:

library(tensorflow)
tf$`__version__`

I have 1.2.0

goldingn commented 7 years ago

Does the following code (which requires greta >= v0.2.1) run without hanging?

dag <- .internals$inference$dag_class$new(list(latent_positions))
jwdink commented 7 years ago
  1. The simple example, and more complex ones, run nicely.
  2. I did try restarting the session, no luck.
  3. It looks like I've got TensorFlow 1.3.0. Should I try downgrading?
  4. The dag code ran without hanging.
goldingn commented 7 years ago

Hmmm, I just upgraded to 1.3.0 and that works fine for me too. You could try 1.2.0, but I suspect that isn't it.

You could try passing the following arguments to model():

compiled_model <- model(latent_positions, compile = FALSE, n_cores = 1L)

but I'm grasping at straws really

goldingn commented 7 years ago

I have the same versions of reticulate, tensorflow (R package) and R6.

goldingn commented 7 years ago

I expect something's bugging out whilst translating the greta model to tensorflow code, so The following code will probably hang too, but would be worth checking:

dag <- .internals$inference$dag_class$new(list(latent_positions))
dag$define_tf()

Is there any CPU activity when the model is compiling, or is it just sitting there?

jwdink commented 7 years ago

I occasionally get the following message after aborting when it hangs:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  No such namespace: reticulate.

And yep, dag$define_tf() hangs too. There is definitely CPU activity.

goldingn commented 7 years ago

Hmm, yeah I get those error messages when aborting too. Part of reticulate's error handling so I don't think it's connected.

How long have you tried leaving it for? I'm just wondering if it's running really slow, rather than stopping altogether.

Defining the tensorflow graph can be slow when there is recursion (for loops particularly), though this is a small one

jwdink commented 7 years ago

I left it for a good 30 minutes-- I had assumed this was definitely long enough, just because everything else I'd ever tried compiled in seconds.

Sounds like we're stumped for now! But let me know if you have anything else you think I should try.

goldingn commented 7 years ago

Yeah, that's definitely too long to be normal behaviour.

One last-ditch attempt. Here's a version that removes the for loop by locally defining cumsum() for greta arrays. It reduces the model from 48 to 33 nodes. There's a chance it may have removed whatever was the problem node:

library (greta)
library (tensorflow)

op <- .internals$nodes$constructors$op
cumsum.greta_array <- function (x) 
  op("cumsum", x, tf_operation = tf$cumsum)

times <- 1:5
Y <- c(-3.12368290898237, -2.37268902597084, -3.31757075118491, -2.0882385972953, -2.23854863652599)
nobs <- length(Y)

sigma_measure <- lognormal(0, 1)
sigma_delta <- lognormal(0, 1);
sigma_position <- lognormal(0, 1);

latent_delta_sd <- c(ones(1), rep(sigma_delta, nobs - 1))
latent_delta <- normal(0, latent_delta_sd)
latent_delta <- cumsum(latent_delta)

latent_positions_sd <- c(ones(1), rep(sigma_position, nobs - 1))
latent_positions_mean <- c(-2, rep(0, nobs - 1))
latent_positions <- normal(latent_positions_mean, latent_positions_sd)
latent_positions <- cumsum(latent_positions) + latent_delta

# observed position:
distribution(Y) <- normal(latent_positions, sigma_measure);

compiled_model <- model(latent_positions)

This runs fine for me (and should be much quicker for big N)

jwdink commented 7 years ago

It works!!!

goldingn commented 7 years ago

Aha! Brilliant! Now I wonder why that is...

Since I can't replicate the bug, I'll close this issue. I'll put cumsum and cumprod in the next github release.

jwdink commented 7 years ago

I know this issue is closed, but just to clarify...

On the third to last line, you wrote:

latent_positions <- cumsum(latent_positions) + latent_delta

...but this should be...

latent_positions <- cumsum(latent_positions + latent_delta)

...right?

(Not trying to be pedantic, just wanting to make sure I'm specifying these kinds of models correctly.)

goldingn commented 7 years ago

Right, latent_delta had already been cumsum-ed, so my code was doing cumsum(a) + cumsum(b), but the stan code is equivalent to cumsum(a + cumsum(b)), which is what you have above. 👍

goldingn commented 7 years ago

Also, thanks for trying greta out and reporting bugs! Feel free to post feature requests too :)

jwdink commented 7 years ago

For sure-- thanks for creating such a helpful package and being such an active maintainer!

BTW, it's worth noting that, with the cumsum modification above, the model does not sample well, and doesn't seem to converge. I've tried messing with the priors and setting inits, to no avail. The Stan equivalent does seem to sample well. I know HMC and NUTS aren't identical, but wasn't sure if there was an obvious change to the model you might recommend that would improve things.

goldingn commented 7 years ago

Hmmm, Id be happy to take a look. Could you repost the poorly-converging greta version to make sure we're on the same page?

goldingn commented 7 years ago

This version converges pretty well for me:

rm(list = ls())

library (greta)
library (tensorflow)

op <- .internals$nodes$constructors$op
cumsum.greta_array <- function (x) 
  op("cumsum", x, tf_operation = tf$cumsum)

times <- 1:5
Y <- c(-3.12368290898237, -2.37268902597084, -3.31757075118491, -2.0882385972953, -2.23854863652599)
nobs <- length(Y)

sigma_measure <- lognormal(0, 1)
sigma_delta <- lognormal(0, 1);
sigma_position <- lognormal(0, 1);

latent_delta_sd <- c(ones(1), rep(sigma_delta, nobs - 1))
latent_delta <- normal(0, latent_delta_sd)
latent_delta <- cumsum(latent_delta)

latent_positions_sd <- c(ones(1), rep(sigma_position, nobs - 1))
latent_positions_mean <- c(-2, rep(0, nobs - 1))
latent_positions <- normal(latent_positions_mean, latent_positions_sd)
latent_positions <- cumsum(latent_positions + latent_delta)

# observed position:
distribution(Y) <- normal(latent_positions, sigma_measure);

compiled_model <- model(latent_positions)

draws_list <- replicate(3, mcmc(compiled_model))
draws <- coda::mcmc.list(draws_list)

bayesplot::mcmc_trace(draws)

screen shot 2017-08-28 at 10 51 45 am

sirt::mcmc.list.descriptives(draws)[, "Rhat"]
[1] 0.9981314 1.0045945 0.9994606 1.0115534 1.0017439
jwdink commented 7 years ago

I apologize, I'd forgotten I'd posted such a small subset of the data (just to demonstrate the model-compilation issue).

The sampling issues come with the full dataset:

library (greta)
library (tensorflow)

op <- .internals$nodes$constructors$op
cumsum.greta_array <- function (x) 
  op("cumsum", x, tf_operation = tf$cumsum)

times <- 1:92
Y <- c(-3.12, -2.37, -3.32, -2.09, -2.24, -1.7, -1.84, -2.24, -1.14, 
       -1.7, -1.22, -1.17, -1, -1.1, -1.32, -1.59, 0.73, -1.18, -0.58, 
       -1.01, 0.01, -1.28, -0.4, -0.15, -0.2, 0.69, -0.26, -1.13, 0.43, 
       0.62, 0.88, -0.25, -0.2, 0.61, 0.73, 0.99, -0.46, -1.23, -0.19, 
       -0.64, 0.1, -0.03, -0.72, -0.5, -0.2, -0.72, -0.18, -0.3, -0.16, 
       -1.17, -0.58, 0.81, -0.29, -0.52, -0.55, -0.25, 1.06, 0.09, 1.13, 
       0.72, 0.13, -0.36, 1.02, 0.74, 0.47, 0.6, -0.22, 1.25, 1.56, 
       0.01, 0.15, 1.28, -0.03, 0.32, 0.22, 0.86, 1.44, -0.5, 1.36, 
       0.77, 1.6, 0.16, 0.52, -0.52, 0.41, 1.77, 1.2, 0.86, 1.67, 1.2, 
       0.68, 1.95, 1.55, 1.96, 2.42, 1.83)
#times <- times[1:60]; Y <- Y[1:60] # as ts-length decreases, sampling improves
nobs <- length(Y)

sigma_measure <- lognormal(0, 1)
sigma_delta <- lognormal(0, 1);
sigma_position <- lognormal(0, 1);

latent_delta_sd <- c(ones(1), rep(sigma_delta, nobs - 1))
latent_delta <- normal(0, latent_delta_sd)
latent_delta <- cumsum(latent_delta)
# latent_delta <- zeros(nobs) # uncomment to make sampling work

latent_positions_sd <- c(ones(1), rep(sigma_position, nobs - 1))
latent_positions_mean <- c(-2, rep(0, nobs - 1))
latent_positions <- normal(latent_positions_mean, latent_positions_sd)
latent_positions <- cumsum(latent_positions + latent_delta)

# observed position:
distribution(Y) <- normal(latent_positions, sigma_measure);

compiled_model <- model(latent_positions)

draws_list <- replicate(3, mcmc(compiled_model, warmup = 250))
draws_subset <- coda::mcmc.list(purrr::map(draws_list, ~.x[,1:5]))
bayesplot::mcmc_trace(draws_subset)

I noticed that the problems increase as the length of the time-series increases from the original subset of 5 to the full length, with no sharp point where sampling-performance suddenly drops, but instead a gradual degredation.

I also noticed that removing the 'delta' component makes the problems go away. However, sharpening the prior on sigma_delta does not help.

I didn't have any luck increasing warmup or specifying manual inits.

goldingn commented 7 years ago

Hi @jwdink Sorry I dropped the ball on this!

There are a couple of things going on here.

  1. Stan defines variables on the scale of the final versions of latent_delta and latent_positions; i.e. after doing the cumulative sum operation. Whereas greta defines variables on the scale the first versions of latent_delta and latent_positions, then does the cumulative sum operation to modify them. That doesn't affect the model, but it does affect the sampler, so could be part of the difference in sampling efficiency for this particular model.

  2. Issue 1 might be interacting with a funnel-shaped posterior effect that you sometimes get in hierarchical models. That can be ameliorated somewhat by using a centring reparameterisation of the model.

  3. Stan (by default) runs HMC on a 'diagonal euclidean metric', which it estimates during the warmup period. In other words, sampling can be slow if the marginal posterior variances are higher for some parameters than others. So Stan empirically estimates the marginal variance of each of the variables, and adjusts the sampler so that steps are longer in the direction of variables with greater variances.

On the dev branch of the goldingn fork, I have now implemented HMC on a diagonal Euclidean metric, with estimation of the metric during warmup. The fiddly bit was actually making it play nicely with the stepsize adaptation. On that branch, I have also added a chains argument to mcmc(), and exported the cumsum function. So here's a script that resolves issues 2 and 3 for your model:

# devtools::install_github("goldingn/greta@dev")
library (greta)

set.seed(2017-09-13)

times <- 1:92
Y <- c(-3.12, -2.37, -3.32, -2.09, -2.24, -1.7, -1.84, -2.24, -1.14,
       -1.7, -1.22, -1.17, -1, -1.1, -1.32, -1.59, 0.73, -1.18, -0.58,
       -1.01, 0.01, -1.28, -0.4, -0.15, -0.2, 0.69, -0.26, -1.13, 0.43,
       0.62, 0.88, -0.25, -0.2, 0.61, 0.73, 0.99, -0.46, -1.23, -0.19,
       -0.64, 0.1, -0.03, -0.72, -0.5, -0.2, -0.72, -0.18, -0.3, -0.16,
       -1.17, -0.58, 0.81, -0.29, -0.52, -0.55, -0.25, 1.06, 0.09, 1.13,
       0.72, 0.13, -0.36, 1.02, 0.74, 0.47, 0.6, -0.22, 1.25, 1.56,
       0.01, 0.15, 1.28, -0.03, 0.32, 0.22, 0.86, 1.44, -0.5, 1.36,
       0.77, 1.6, 0.16, 0.52, -0.52, 0.41, 1.77, 1.2, 0.86, 1.67, 1.2,
       0.68, 1.95, 1.55, 1.96, 2.42, 1.83)
nobs <- length(Y)

sigma_measure <- lognormal(0, 1)
sigma_delta <- lognormal(0, 1);
sigma_position <- lognormal(0, 1);

latent_delta_sd <- c(ones(1), rep(sigma_delta, nobs - 1))
latent_delta_raw <- normal(0, 1, dim = nobs)
latent_delta <- latent_delta_raw * latent_delta_sd
latent_delta <- cumsum(latent_delta)

latent_positions_sd <- c(ones(1), rep(sigma_position, nobs - 1))
latent_positions_mean <- c(-2, rep(0, nobs - 1))
latent_positions_raw <- normal(0, 1, dim = nobs)
latent_positions <- latent_positions_mean + latent_positions_raw * latent_positions_sd
latent_positions <- cumsum(latent_positions + latent_delta)

# observed position:
distribution(Y) <- normal(latent_positions, sigma_measure);

compiled_model <- model(latent_positions)

draws <- mcmc(compiled_model, warmup = 1000, n_samples = 1000, chains = 2)
bayesplot::mcmc_trace(draws, pars = paste0("latent_positions", 1:5))

It's better, but not great. I'll see if I can optimise the sampler some more. Note the increased number of warmup iterations that are needed to get good estimates of the metric; Stan uses 2000 by default.

jwdink commented 7 years ago

Thanks for following up! Really appreciate how much you've looked into this.

Oddly enough, though, when I run your exact code, both chains are flat (i.e., give the same value for all samples). I set the seed in the same way too...

This is a new computer, so freshly installed tensorflow etc.

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] greta_0.2.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12      bindr_0.1         magrittr_1.5      progress_1.1.2    munsell_0.4.3     colorspace_1.3-2  lattice_0.20-35   R6_2.2.2          rlang_0.1.2      
[10] stringr_1.2.0     plyr_1.8.4        dplyr_0.7.3       tools_3.4.1       bayesplot_1.3.0   parallel_3.4.1    grid_3.4.1        gtable_0.2.0      coda_0.19-1      
[19] tfruns_0.9.1      digest_0.6.12     lazyeval_0.2.0    assertthat_0.2.0  tibble_1.3.4      tensorflow_1.3.1  bindrcpp_0.2      reshape2_1.4.2    ggplot2_2.2.1    
[28] glue_1.1.1.9000   labeling_0.3      stringi_1.1.5     compiler_3.4.1    scales_0.5.0      prettyunits_1.0.2 reticulate_1.1    jsonlite_1.5      pkgconfig_2.0.1  
> tensorflow::tf_version()
[1] ‘1.3’
goldingn commented 7 years ago

Ouch, yeah it looks like I broke the sampler adaptation on the dev branch since I posted that - I'm working on running multiple chains in parallel.

This commit is what I would have been referring to, and now runs for this model: devtools::install_github("goldingn/greta@ddd3a54")

I'll get this all patched up and merged into a working release. Thanks for spotting this and letting me know!

goldingn commented 7 years ago

Bug now fixed on devtools::install_github("goldingn/greta@dev"), in case you want to play with the new evaluate() feature :)