Closed goldingn closed 5 years ago
the user-facing syntax should look something like this (to match deSolve):
usage:
ode(y, times, func, ...)
mock example:
# a user-defined ODE. y (last y) and t (time difference) required arguments, the rest optional
deriv <- function (t, y, arg1, arg2) {
...
}
# some parameters of the ODE
arg1 = normal(0, 3)
arg2 = normal(0, 3)
# initial condition
initial_values = normal(0, 1, dim = 3) # value of y at time t = 0
# times at which to evaluate the ode
times <- seq(5, 100, by = 5)
# integrate
results <- ode(initial_values, times, deriv, arg1, arg2)
returning a 20x3 greta array (results
) of the values of y at each of the times.
Internally, tf_ode
will need to define (in the tf environment) a wrapper function exposing only y
and t
, to allow R's lexical scoping to find the parameters
Need a way of parsing a function using greta arrays, and replacing it with a TF-y function. The ode method could then translate func
, (with lexical scoping to find the things it depends on) and then pass this as an argument to TF's ODE solver.
One approach would be to execute the same function, but shim op()
to return the tf function:
op_shim <- function (operation, ..., dimfun = NULL, operation_args = list(), tf_operation = NULL, value = NULL) {
if (is.null(tf_operation))
tf_operation <- paste0("tf$", operation))
get(tf_operation)
}
environment(func)$op <- op_shim
This would require passing on the relevant arguments too.
It might be tidier to add this tf-returning behaviour to op, with a return_tf
argument or similar.
Hi Nick, awesome package.
I've plugged a system of differential equations into Tensorflow's odeint
function and wrapped it as a greta operation
to use in a model.
It was remarkably straightforward, however I'm still stuck on inference. In the script greta_model.R:
growth_ode
operation returns the values I expect in a greta arraymu
do not appear to move from their initial conditions and I get SE's of ~0Would you mind taking a look to see where I might've gone wrong?
While this a bespoke example, mostly to get my head around how greta and TF work under the hood, I'd be happy to chip in if/when you do devise a parser for ODE functions.
Many thanks, Andrew
Hey Andrew,
I'm glad you are finding it useful! By the look of your code, you've got the gist of how TF & greta talk to one another.
When I run MCMC, the sampler says 100% bad
; which means all proposals are being rejected because either the log density or gradient evaluates to a non-finite value. Usually this is a numerical overflow issue, so happens a bit at the beginning and then goes away as the sampler warms up. Here's it's happening all the time, so the sampler isn't going anywhere.
To diagnose this, I plugged in some values for the parameters (transformed to an unconstrained scale) and returned the log density and it's gradient by interfacing directly with the dag
object within the model m
:
m$dag$send_parameters(rep(0.1, S))
m$dag$log_density()
[1] -485619648
m$dag$gradients()
[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN
The problem is that TF can't calculate the gradients for the integral, and therefore the gradients for the log density, because there's a discrete operation in the model.
Your best bet will probably be to try to recast the model without any discrete operations. If you do need to set those negative values to 0, you could try multiplying them by a sigmoid so that negative values are multiplied by a number close to 0, and positive values by a number close to 1:
dB <- B_delta * tf$sigmoid(B_delta)
When I swap that in for the original line the sampler works, albeit pretty slowly.
You can multiply the argument to the sigmoid by some number greater than 1 to make the sigmoid stronger and the mask closer to 0 and 1 (e.g. tf$sigmoid(B_delta * 10)
), though that will increase the risk of a rounding error, and getting bad
samples.
One thing I need to do is warn users that the gradient isn't available. I'll open a new issue for that shortly.
Longer term, I'm planning to introduce samplers for models with discrete elements (theres a PR at #104 but it won't be ready very soon), though those are likely to be much less efficient that HMC.
Nick
Ahaa.. that would do it. Good spotting, and thanks for the suggestions. In this case, the discrete operation can be ditched with only minor changes in interpretation.
It does run slowly, surprisingly so given how fast the Tensorflow operation is (although there is a delay compiling/adding it to the graph):
S = 9
mu_true = tf$constant(seq(0.05, 0.09, length.out = S), shape = shape(S, 1))
a <- tf_growth_function(mu_true, S, 100L)
system.time(b <- sess$run(a))
takes 0.83s
on my laptop, but still ~10-15min for 200 iterations. Am I underestimating the overhead of calculating log densities or gradients?
I think the slow-down is that each step of the HMC sampler has to try 10-20 different parameter sets and return the density and the gradient of the density (for the leapfrog integrator).
Adding that in, I would expect about 10s per iteration on solving the ODE:
g <- tf$gradients(a, mu_true)
sess <- tf$Session()
system.time(replicate(15, {sess$run(a); sess$run(g)}))
user system elapsed
13.534 7.099 9.473
the gradient takes about 6x as long to compute as the value. So your best best for a speed up would be to profile and optimise the code for the ODE as much as you can
OK, thanks. This leads nicely to a question about how greta scales with compute resources. From the little digging I've done, Tensorflow's ConfigProto
should spread things out over as many cores as I throw at it?
Yup, by default it should find all the cores and use as many as it can per operation, so you'll only get a speedup if the operation is big enough that it makes sense to split it up. E.g. linear algebra ops
Here's a fairly advanced prototype of an ODE solver in greta: https://gist.github.com/goldingn/bea9e3dac12cbcb53e4e3fa8456d77bc
It works by executing the derivative function on mocked-up versions of the inputs, and using that to build a separate DAG object for the function. It then creates a tensorflow function of the derivative which runs by defining this sub-DAG in a new environment, taking pre-defined tensors. This function is then passed to tf$contrib$integrate$odeint()
to get a tensor for the solution.
I'm planning to wrap this up as an extension package (greta.ode
or similar) so that it doesn't clutter the main greta API.
Neat! That's so much tidier than coding up the derivative in tensorflow. What's the advantage of the sub-DAG?
The sub-DAG method is just the only way (I know) to convert from greta code to tensorflow code without writing a full-blown parser and translator to convert the function.
It would be interesting to see if there's a performance hit in this approach, compared with manually defining the tensorflow function as you did previously. There shouldn't be.
Cool. You mention it's still a little slow - one popular approach to these problems is to approximate the gradient of the derivative with GPs, splines or reproducing kernel Hilbert spaces (??) and estimate parameters from the approximation with fewer ODE solves. There's a nice review by Benn MacDonald and recent papers show it scales well to large systems with many states.
Maybe one day I could take a stab at it in greta, I'll put it on the list.
Oh nice, yeah approximations like that would be ace. They could live in the same extension package for solving ODEs in greta.
Have you seen GPflowOpt for Bayesian optimisation with GPs? It might not make sense to have that as a dependency, but it uses similar ideas and is already in tensorflow.
I've decided this solver will live in greta.dynamics
The ode solver gist has been updated to work with the dev version fo greta, so is ready to go into greta.dynamics
implemented on the dev branch of greta.dynamics, so closing this
A wrapper around the TensorFlow contributed module to solve ODEs.