greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
527 stars 63 forks source link

add ode solver method #70

Closed goldingn closed 5 years ago

goldingn commented 7 years ago

A wrapper around the TensorFlow contributed module to solve ODEs.

goldingn commented 7 years ago

the user-facing syntax should look something like this (to match deSolve):

usage:

ode(y, times, func, ...)

mock example:

# a user-defined ODE. y (last y) and t (time difference) required arguments, the rest optional
deriv <- function (t, y, arg1, arg2) {
  ...
}

# some parameters of the ODE
arg1 = normal(0, 3)
arg2 = normal(0, 3) 

# initial condition
initial_values = normal(0, 1, dim = 3)  # value of y at time t = 0

# times at which to evaluate the ode
times <- seq(5, 100, by = 5)

# integrate
results <- ode(initial_values, times, deriv, arg1, arg2)

returning a 20x3 greta array (results) of the values of y at each of the times.

Internally, tf_ode will need to define (in the tf environment) a wrapper function exposing only y and t, to allow R's lexical scoping to find the parameters

goldingn commented 7 years ago

Need a way of parsing a function using greta arrays, and replacing it with a TF-y function. The ode method could then translate func, (with lexical scoping to find the things it depends on) and then pass this as an argument to TF's ODE solver.

One approach would be to execute the same function, but shim op() to return the tf function:

op_shim <- function (operation, ..., dimfun = NULL, operation_args = list(), tf_operation = NULL, value = NULL) {
  if (is.null(tf_operation))                                    
    tf_operation <- paste0("tf$", operation))
  get(tf_operation)
}
environment(func)$op <- op_shim

This would require passing on the relevant arguments too.

It might be tidier to add this tf-returning behaviour to op, with a return_tf argument or similar.

aornugent commented 6 years ago

Hi Nick, awesome package.

I've plugged a system of differential equations into Tensorflow's odeint function and wrapped it as a greta operation to use in a model.

It was remarkably straightforward, however I'm still stuck on inference. In the script greta_model.R:

Would you mind taking a look to see where I might've gone wrong?

While this a bespoke example, mostly to get my head around how greta and TF work under the hood, I'd be happy to chip in if/when you do devise a parser for ODE functions.

Many thanks, Andrew

goldingn commented 6 years ago

Hey Andrew,

I'm glad you are finding it useful! By the look of your code, you've got the gist of how TF & greta talk to one another.

When I run MCMC, the sampler says 100% bad; which means all proposals are being rejected because either the log density or gradient evaluates to a non-finite value. Usually this is a numerical overflow issue, so happens a bit at the beginning and then goes away as the sampler warms up. Here's it's happening all the time, so the sampler isn't going anywhere.

To diagnose this, I plugged in some values for the parameters (transformed to an unconstrained scale) and returned the log density and it's gradient by interfacing directly with the dag object within the model m:

m$dag$send_parameters(rep(0.1, S))
m$dag$log_density()
[1] -485619648
m$dag$gradients()
[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN

The problem is that TF can't calculate the gradients for the integral, and therefore the gradients for the log density, because there's a discrete operation in the model.


Your best bet will probably be to try to recast the model without any discrete operations. If you do need to set those negative values to 0, you could try multiplying them by a sigmoid so that negative values are multiplied by a number close to 0, and positive values by a number close to 1:

dB <- B_delta * tf$sigmoid(B_delta)

When I swap that in for the original line the sampler works, albeit pretty slowly.

You can multiply the argument to the sigmoid by some number greater than 1 to make the sigmoid stronger and the mask closer to 0 and 1 (e.g. tf$sigmoid(B_delta * 10)), though that will increase the risk of a rounding error, and getting bad samples.


One thing I need to do is warn users that the gradient isn't available. I'll open a new issue for that shortly.

Longer term, I'm planning to introduce samplers for models with discrete elements (theres a PR at #104 but it won't be ready very soon), though those are likely to be much less efficient that HMC.

Nick

aornugent commented 6 years ago

Ahaa.. that would do it. Good spotting, and thanks for the suggestions. In this case, the discrete operation can be ditched with only minor changes in interpretation.

It does run slowly, surprisingly so given how fast the Tensorflow operation is (although there is a delay compiling/adding it to the graph):

S = 9
mu_true = tf$constant(seq(0.05, 0.09, length.out = S), shape = shape(S, 1))
a <- tf_growth_function(mu_true, S, 100L)

system.time(b <- sess$run(a))

takes 0.83s on my laptop, but still ~10-15min for 200 iterations. Am I underestimating the overhead of calculating log densities or gradients?

goldingn commented 6 years ago

I think the slow-down is that each step of the HMC sampler has to try 10-20 different parameter sets and return the density and the gradient of the density (for the leapfrog integrator).

Adding that in, I would expect about 10s per iteration on solving the ODE:

g <- tf$gradients(a, mu_true)
sess <- tf$Session()
system.time(replicate(15, {sess$run(a); sess$run(g)}))
   user  system elapsed 
 13.534   7.099   9.473

the gradient takes about 6x as long to compute as the value. So your best best for a speed up would be to profile and optimise the code for the ODE as much as you can

aornugent commented 6 years ago

OK, thanks. This leads nicely to a question about how greta scales with compute resources. From the little digging I've done, Tensorflow's ConfigProto should spread things out over as many cores as I throw at it?

goldingn commented 6 years ago

Yup, by default it should find all the cores and use as many as it can per operation, so you'll only get a speedup if the operation is big enough that it makes sense to split it up. E.g. linear algebra ops

goldingn commented 6 years ago

Here's a fairly advanced prototype of an ODE solver in greta: https://gist.github.com/goldingn/bea9e3dac12cbcb53e4e3fa8456d77bc

It works by executing the derivative function on mocked-up versions of the inputs, and using that to build a separate DAG object for the function. It then creates a tensorflow function of the derivative which runs by defining this sub-DAG in a new environment, taking pre-defined tensors. This function is then passed to tf$contrib$integrate$odeint() to get a tensor for the solution.

I'm planning to wrap this up as an extension package (greta.ode or similar) so that it doesn't clutter the main greta API.

aornugent commented 6 years ago

Neat! That's so much tidier than coding up the derivative in tensorflow. What's the advantage of the sub-DAG?

goldingn commented 6 years ago

The sub-DAG method is just the only way (I know) to convert from greta code to tensorflow code without writing a full-blown parser and translator to convert the function.

It would be interesting to see if there's a performance hit in this approach, compared with manually defining the tensorflow function as you did previously. There shouldn't be.

aornugent commented 6 years ago

Cool. You mention it's still a little slow - one popular approach to these problems is to approximate the gradient of the derivative with GPs, splines or reproducing kernel Hilbert spaces (??) and estimate parameters from the approximation with fewer ODE solves. There's a nice review by Benn MacDonald and recent papers show it scales well to large systems with many states.

Maybe one day I could take a stab at it in greta, I'll put it on the list.

goldingn commented 6 years ago

Oh nice, yeah approximations like that would be ace. They could live in the same extension package for solving ODEs in greta.

Have you seen GPflowOpt for Bayesian optimisation with GPs? It might not make sense to have that as a dependency, but it uses similar ideas and is already in tensorflow.

goldingn commented 6 years ago

I've decided this solver will live in greta.dynamics

goldingn commented 5 years ago

The ode solver gist has been updated to work with the dev version fo greta, so is ready to go into greta.dynamics

goldingn commented 5 years ago

implemented on the dev branch of greta.dynamics, so closing this