Closed MichaelClerx closed 6 years ago
Yeah, I suspect things like CVODES (note the S) can get this for you fairly simply...
Hmmm. I guess that would make it outside the scope of Pints though, apart from the bit where we provide an interface to get the information in! Sanmitra was talking about something similar though, so I'll have a look at doing this for cell models anyway...
take a look at this solver package. they have got the sensitivity staff done. http://www.jmodelica.org/assimulo_home/_modules/assimulo/examples/cvode_with_initial_sensitivity.html.
I'll try to get this in Myokit, and in Jonathan's new solver back-end once I know how that works. But neither are Pints issues so closing this ticket!
By the way, if anyone needs it in a hurry, Myokit already has a (slow, not cvode-based) method to do this: http://docs.myokit.org/api_simulations/PSimulation.html
And chaste can do it via the automatic jacobian generator, right?
Chaste's Jacobian is for the RHS (dy/dt = f(t,y) = [f_1 f_2 f_3...]) to speed up ODE solving, so it looks like [df_1/dy_1..df_1/dy_2 etc.] not really anything to do with outputs (or 'auxiliary functions' like current) with respect to parameters p [dy_1/dp_1...dy_1/dp_2 etc.].
Depends on what pints is supposed to be doing for us? But I would have thought this was in scope if it is supposed to be evaluating likelihoods and doing things with them (Fisher should be second derivative of -log likelihood w.r.t. parameters).
Oh ok. Well if you don't mind forward euler you can get dy/dp
or (dI/dp
) in Myokit for the time being :-)
Pints is not an ODE solver (and I'm pretty sure it shouldn't be). It expects users to provide external models that have a method simulate(self, parameters, times)
that return a series of y(t)
evaluations. These are then passed to Pints using the following interface: (https://github.com/martinjrobins/pints/blob/master/pints/_core.py#L14-L50) that your forward model should be able to implement.
We don't want to write yet another ODE/PDE/Everything solver
So what we can do is add an interface for a model that can also provide dy/dp|t
evaluations, but it shouldn't be Pints' responsibility to calculate them.
I think I want Pints to go from dy/dp hessian to likelihood and Fisher information, and to provide some kick to whatever solvers are underneath to say which parameters etc. we are interested in at the moment...
As long as we can come up with a sensible interface for it. At the moment, Pints has no notion of a solver and I think that's a good thing!
At the moment we have a separate class for models with sensitivities.
See also #343 and #344
Alternative is to add a grad1()
method and a grad2()
method to LogPdf
etc., and have it raise a NotImplementedError
by default. The grad1() method could then return either dL/dp
or (L, dL/dp)
.
Not sure which is best!
(One thing we'd lose is that you can use the LogLikelihood classes as functions)
Just chatting to @sanmitraghosh about how good 1st order methods will need to be if we want to see an overal speed up.
Using the IKr model as an example, we have ns=2
states and np=8
(ignoring g and E) parameters.
This means that adding sensitivities means
2
state one into a 2+2*8=18
state one (because you need to integrate all the sensitivities).2*x
evaluations to approx 2*x+2*x*8
evaluations (For auto-diff this is exact, although the speeds vary, using analytical solutions you'll get some simplifications and some complications, so I'm assuming it balances out)If the solver efficiency is determined by the number of states to integrate, this means both the solver code and the RHS evaluation code will slow down by a factor of approximately ns / (ns + ns*np) = 1 / (1 + np)
That means the method will have to be (1 + np)
(=9 for IKr) times more efficient if we want to see any benefit.
(Note: We're really after d output / d params
, not d state / d params
, but I'm assuming the output depends on every state)
Any thoughts @ben18785 and @mirams ? Am I missing something?
A lot of methods will use finite differences (or similar) to do something like gradient descent if they don't have sensitivities, so those will typically need to run model at parameters, and then plus a bit and minus a bit on each one, so will take (1+2*np = 19) evaluations of the ordinary ODE system to get a basic approximation for the gradient. So if you can get the exact gradient in 9 times the effort it is probably a win!
That's a good point!
Not sure how it works out for sampling (MCMC methods do 1 evaluation per iteration, but I suppose you'd have to look at the effective sample size?).
For optimisation there's definitely methods that re-use information (e.g. Nelder-Mead and other simplex methods that move 1 vertice at a time).
The particle ones currently use 4 + floor(3 * ln(np))
as a heuristic for the evaluations per iteration, which works out as 10 per iteration for IKr
(same kind of thing for CVODE having an Analytic Jacobian versus multiple calls to the RHS to approximate it)
Here are two proposals:
The methods work with a LogPdf
, that they optimise or sample from:
LogPdf
__call__(p) --> L(p)
LogPdfS1
__call__(p) --> L(p), L'(p)
LogPdfS2
__call__(p) --> L(p), L'(p), L''(p)
The main example is the:
KnownNoiseLogLikelihood
__call__(p) --> L(p)
KnownNoiseLogLikelihoodS1
__call__(p) --> L(p), L'(p)
KnownNoiseLogLikelihoodS2
__call__(p) --> L(p), L'(p), L''(p)
Note at this point there's potential for a bit of code duplication, as the S1 version does everything the bare version does, and the S2 version does everything the S1 version does.
The situation is very similar for an error measure E(p)
.
All our noise-model based LogLikelihoods extend the ProblemLogLikelihood
class (but user-defined ones and toy ones don't have to).
ProblemLogLikelihood
__call__(p) --> L(p), calls Problem.__call__(p)
ProblemLogLikelihoodS1
__call__(p) --> L(p), L'(p), calls ProblemS1.__call__(p)
ProblemLogLikelihoodS2
__call__(p) --> L(p), L'(p), L''(p), calls ProblemS2.__call__(p)
Our problem classes would then be defined as:
(Single/MultiOutput)Problem
evaluate(p) --> y(p), calls ForwardModel.simulate(p, times)
(Single/MultiOutput)ProblemS1
evaluate(p) --> y(p), y'(p), calls ForwardModelS1.simulate(p, times)
(Single/MultiOutput)ProblemS2
evaluate(p) --> y(p), y'(p), y''(p), calls ForwardModelS2.simulate(p, times)
Note that y(p)
here is typically the solution to an ODE y(p) = integrate(f(p), y0)
.
With associated model classes:
ForwardModel
simulate(p, times) --> y(p)
ForwardModelS1
simulate(p, times) --> y(p), y'(p)
ForwardModelS2
simulate(p, times) --> y(p), y'(p), y''(p)
For ODEs, this model formulation makes a lot of sense as y'(p)
will often be calculated in tandem with y(p)
(e.g. using CVODE to integrate f(p)
and f'(p))
.
This is true irrespective of how f'(p)
was obtained (autodiff or analytical).
In this scenario, we can check at construction time that everything is going to work, and users only need to implement a single simulate
method.
However, there is scope for some duplicate code.
Situation 1: Method wants jacobian, model doesn't provide it.
A method that wants a jacobian can check type of LogPdf
it was given (so incompatibile models are detected at construction time).
Situation 2: Model has f'(x)
and/or f''(x)
, method doesn't need it.
The model can't be used! The user will have to make a new, simpler wrapper.
This isn't as bad as it sounds: you don't want this overhead anyway if it isn't needed.
You can also see this as the method warning the user of unneeded overhead!
Least-effort?: The situation for methods without sensitivities doesn't change at all, which is good. The interface to users is simple and there is no hidden stuff (e.g. caching) making Pints harder to understand.
In a slight variation, we could do this:
ForwardModel
simulate(p, times) --> y(p)
ForwardModelS1
simulate_s1(p, times) --> y(p), y'(p)
ForwardModelS2
simulate_s2(p, times) --> y(p), y'(p), y''(p)
This would allow the creation of model wrappers implementing both ForwardModel
(which defines simulate
) and ForwardModelS1
(which defines simulate_s1
), and that use the appropriate simulation method when needed.
The methods work with a LogPdf
, that they optimise or sample from:
LogPdf
__call__(p) --> L(p)
jacobian(p) --> L'(p)
hessian(p) --> L''(p)
The main example is the:
KnownNoiseLogLikelihood
__call__(p) --> L(p)
jacobian(p) --> L'(p)
hessian(p) --> L''(p)
Note that the calculation of L(p)
, L'(p)
, and L''(p)
are fairly independent, as long as the simulated results are stored.
For the problem class, we get
ProblemLogLikelihood
__call__(p) --> L(p), calls Problem.__call__(p) OR uses cached result
jacobian(p) --> L'(p), calls Problem.jacobian(p) OR uses cached result
hessian(p) --> L''(p), calls Problem.hessian(p) OR uses cached result
At this point, we could do some caching to ensure that no double simulations are run, although we could also leave that to the problem class:
(Single/MultiOutput)Problem
evaluate(p) --> y(p), calls ForwardModel.values OR uses cached result
jacobian(p) --> y'(p), calls ForwardModel.jacobian OR uses cached result
hessian(p) --> y''(p), calls ForwardModel.hessian OR uses cached result
Again, we could leave the caching for a lower level:
ForwardModel
simulate(p, times) --> y(p), y'(p), y''(p) :: DEFINED BY USER
values(p, times) --> calls simulate or uses cached result :: DEFINED BY PINTS
jacobian(p, times) --> calls simulate or uses cached result :: DEFINED BY PINTS
hessian(p, times) --> calls simulate or uses cached result :: DEFINED BY PINTS
In this version, all models implement the simulate() method to return three parts (with two being None for models that don't support jacobians).
Alternatively, we could still have a split at the model level
ForwardModel
simulate(p, times) --> y(p) :: DEFINED BY USER
values(p, times) --> calls simulate or uses cached result
ForwardModelS1
simulate(p, times) --> y(p), y'(p), y''(p) :: DEFINED BY USER
values(p, times) --> calls simulate or uses cached result
jacobian(p, times) --> calls simulate or uses cached result
ForwardModelS2
simulate(p, times) --> y(p), y'(p), y''(p) :: DEFINED BY USER
values(p, times) --> calls simulate or uses cached result
jacobian(p, times) --> calls simulate or uses cached result
hessian(p, times) --> calls simulate or uses cached result
This puts the caching issue at the lowest level, which avoids a lot of trouble. That does mean that if people want to implement something on one of the higher levels (so bypass the Problem/Model classes, like we do for the toy problems), that they don't benefit from it.
Situation 1: Method wants jacobian, model doesn't provide it.
The call to jacobian
would trickle down to the model level, where a TypeError would be raised because the method doesn't exist.
Situation 2: Model has f'(x)
and/or f''(x)
, method doesn't need it.
In this case, the jacobian (and hessian) will be calculated anyway, as the model has no way of knowing that these methods aren't wanted. (If we create some way, we lose the advantages of having a simple user interface and caching in the Pints classes).
Again, not a huge deal though, as it would make more sense for people to write a model appropriate to what they want to do (i.e. not calculate unneeded stuff).
Least-effort?: A lot of Pints code stays the same, except we add on methods everywhere. It is trickier for the user though, as they need to know about the caching to understand what's going on. The model interface also gets more complicated, even if you don't use sensitivities.
Curious to hear your thoughts. Sorry for the massive post @martinjrobins @ben18785 @sanmitraghosh @chonlei @mirams
I have two options as well. Both of these are inspired by Michael's option 2)
Making the split at the model level, and using add-on methods for higher-level classes, seems like a good (i.e. simple) approach. I'm not keen on complicating the model interface for non-sensitivity models, could you have something like this so that ForwardModel
remains as simple as possible?:
ForwardModel
values(p, times) --> y(p) :: DEFINED BY USER
ForwardModelS1
simulate(p, times) --> y(p), y'(p) :: DEFINED BY USER
values(p, times) --> calls simulate or uses cached result
jacobian(p, times) --> calls simulate or uses cached result
ForwardModelS2
simulate(p, times) --> y(p), y'(p), y''(p) :: DEFINED BY USER
values(p, times) --> calls simulate or uses cached result
jacobian(p, times) --> calls simulate or uses cached result
hessian(p, times) --> calls simulate or uses cached result
Yes, it does look a bit odd that users define values
for non-sensitivity models, and simulate
for sensitivity models, but if we are agreed that sensitivities will be useless, then for the vast majority of the time users will only implement ForwardModel
, and not have to worry about the added complexity of ForwardModelS1
and ForwardModelS2
The caching sounds complicated and error-prone. Perhaps we can have the models return some sort of object that has values
, jacobian
and hessian
functions defined (or not defined).
ForwardModel
simulate(p, times) --> ForwardModelResult(p, times)
ForwardModelResult
values() --> y(p)
ForwardModelS1
simulate(p, times) --> ForwardModelS1Result(p, times)
ForwardModeS1Result
values() --> y(p)
jacobian() --> y'(p)
ForwardModelS2
simulate(p, times) --> ForwardModelS2Result(p, times)
ForwardModeS2Result
values() --> y(p)
jacobian() --> y'(p)
hessian() --> y''(p)
So effectively you still have the same "caching" behaviour, since the results objects would probably calculate everything in the constructor, but this would be explicit and not hidden
The downside of this is added complexity in the design for the forward models, but perhaps you could just the user to implement the constructor of the results object, then use some fancy Python decorator magic to generate the above design.
ForwardModel()
n_derivatives() --> 0, 1, or 2
simulate() --> y, (y, y'), or (y, y', y'')
XOutputProblem()
n_derivatives() --> model.n_derivatives()
evaluate() --> y, (y, y'), or (y, y', y'')
Then, we to check n_derivatives in the ErrorMeasure
and ProblemLogLikelihood
classes. At this level we can pass on n_derivatives()
from the problem OR the highest one that we've implemented for that particular measure/loglikelihood. I think it's ok to do this here because the error measures / loglikelihoods already need to know about sensitivities
@martinjrobins suggests in this method we could add an optional argument to Model.simulate(derivatives=0)
that indicates the level of derivatives we want. Most models can ignore this argument & the Problem classes can check if the requested level of derivatives are available...
Based on that, I propose
ForwardModel
simulate(p, times, derivatives=0) --> y
simulate(p, times, derivatives=1) --> (y, y')
simulate(p, times, derivatives=2) --> (y, y', y'')
n_derivatives() --> The maximum number of derivatives available (default=0)
XOutputProblem
evaluate(p, derivatives=0) --> y
evaluate(p, derivatives=1) --> (y, y') or an exception, if Model.n_derivatives() < 1
evaluate(p, derivatives=2) --> (y, y', y'') or an exception, if Model.n_derivatives() < 2
n_derivatives() --> The maximum number of derivatives available (passed on from Model)
This lets methods decide which version of a model they want (if models support this)
yup, I think the n_derivatives argument is the way to go. Would higher level classes (e.g. log likelihood), have a derivatives argument as well (which they just pass down)?
@martinjrobins yup!
However, that method makes the documentation for ForwardModel much harder to read, so that the ForwardModel interface (users main entry point!) becomes harder to understand.
We can probably make life simpler by having:
ForwardModel
simulate(p, times) --> y
ForwardModelS1
simulate(p, times, derivatives=0) --> y
simulate(p, times, derivatives=1) --> (y, y')
ForwardModelS2
simulate(p, times, derivatives=0) --> y
simulate(p, times, derivatives=1) --> (y, y')
simulate(p, times, derivatives=2) --> (y, y', y'')
XOutputProblem
evaluate(p, derivatives=0) --> y
evaluate(p, derivatives=1) --> (y, y') or an exception if the model doesn't support it
evaluate(p, derivatives=2) --> (y, y', y'') or an exception if the model doesn't support it
n_derivatives() --> The maximum number of derivatives available (passed on from Model)
LogLikelihoods etc. still all use an n_derivatives()
method to indicate their level of support
Closed with #358
Gary wrote: "Another 'whilst I remember' type thing! It would be good to get boring-old-Fisher Information / Hessian at the MLE and the covariance matrix that that implies, so we could compare max likelihood with Bayesian for some of these problems. Some of our peaks are so unimodal I suspect it may be an excellent approximation for a lot of our problems, and a zillion times faster."