Closed charlesm93 closed 4 years ago
I am not sure of the parallelisation from kinsol is useful for us. At least I looked at this for CVODES a while ago and concluded its not useful for our applications. We usually solve rather small ODE systems such that each BDF iteration is fast. These parallel things from Sundials are useful (to my understanding) in case you already have performance issues with each BDF step. However, if we were to throw such a large problem at Stan it would never finish.
Solving small to medium ODEs is already insane in a Bayesian framework. Going for large models where each BDF step needs parallelisation to be ok to live with would be beyond doable. At least this was my take on this. Maybe Kinsol is different, but just a bit food for thought.
In population PK models we are already solving large linear systems, so it’s doable. Whether we do parallel on population level or on linear system level, it’s just decompose the linear system in different ways. In addition there are plenty of problems outside of IVP ODE field can benefit from a parallel solver. simple examples can be found in parameterized BVP ODE/PDE such as poisson equation, which shows up more than frequently in engineering design and optimization.
-YZ
On Feb 8, 2019 at 8:08 AM, <wds15 (mailto:notifications@github.com)> wrote:
I am not sure of the parallelisation from kinsol is useful for us. At least I looked at this for CVODES a while ago and concluded its not useful for our applications. We usually solve rather small ODE systems such that each BDF iteration is fast. These parallel things from Sundials are useful (to my understanding) in case you already have performance issues with each BDF step. However, if we were to throw such a large problem at Stan it would never finish.
Solving small to medium ODEs is already insane in a Bayesian framework. Going for large models where each BDF step needs parallelisation to be ok to live with would be beyond doable. At least this was my take on this. Maybe Kinsol is different, but just a bit food for thought.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub (https://github.com/stan-dev/stan/issues/2735#issuecomment-461853884), or mute the thread (https://github.com/notifications/unsubscribe-auth/AfkhsPi6DbRCUHNnrxaiPrdeDn25briKks5vLaEJgaJpZM4awOS9).
It would be great to see an example where parallel ODE solving is helping to speed up things.
I would expect that this is too granular and that parallelising things on a different level makes more sense.... but of course, if one only cares about a single large ODE system then this is a different story, of course.
I think at this point we should decide how the newton solver is going to be used. Since there are plenty of knobs to turn in Newton family, if we expose the solver like powell_solver
, we want to be judicious of the controls to expose so it doesn't become a footshooter. On the other hand, it's most likely also a building block for math
functions, in that case maybe we want to keep sundials API complete.
It would be great to see an example where parallel ODE solving is helping to speed up things. I would expect that this is too granular and that parallelising things on a different level makes more sense.... but of course, if one only cares about a single large ODE system then this is a different story, of course.
Solving a bunch of small ODEs(for example, using map_rect
) is equivalent to solving a single large ODE that generates block-diagonal linear system at every time step. So if we throw a population PK model to CVODES and solve it using its parallel interface, we are getting the same benefit. In fact I would argue solving the popPK as a large system has additional benefit of more robust reordering and convergence checking.
So small vs large ODE system really depends on one's perspective.
Can we do something akin to
map()
and MPI?
I don't think map
is relative here, as we expect to solve the entire system, not collectively solving a bunch of nonlinear systems. As to MPI, it's more complicated. SUNDIALS doesn't support MPI for direct solvers, for that one has to look at things like MUMPS or PARDISO, but license-wise both are hopeless. For MPI, sundials supports iterative solvers, which opens can of worms of more knobs to turn and problems to tune. Since the only difference is vector interface, I'd say we start from OpenMP or pthread interface when dealing with parallel linear solver.
For problems that can generate Jacobian by blocks in parallel, then we can look at the linear system solving from the angle of domain decomposition.
Ok, going for a block diagonal thing I can see this may work... but at least for my datasets I still doubt it would be a good idea. Patients are on slightly differently schedules, sometimes even very different schedules. Not all subjects have the same ODE stiffness properties, etc.. solving in small units has the additional benefits of keeping stuff almost for sure inside the inner most CPU caches and the like.
There are certainly applications for large solves, yes, but my guess is that this rather rare in comparison. I can be wrong here, of course.
I agree the patient dosing events difference warrants individual level parallelization, which is why Torsten does the same. But here we are talking about a general nonlinear solver, potentially will be part of various applications other than IVP ODE, so I don’t think we should make more assumptions than necessary, especially at this early stage.
-YZ
On Feb 8, 2019 at 10:03 AM, <wds15 (mailto:notifications@github.com)> wrote:
Ok, going for a block diagonal thing I can see this may work... but at least for my datasets I still doubt it would be a good idea. Patients are on slightly differently schedules, sometimes even very different schedules. Not all subjects have the same ODE stiffness properties, etc.. solving in small units has the additional benefits of keeping stuff almost for sure inside the inner most CPU caches and the like.
There are certainly applications for large solves, yes, but my guess is that this rather rare in comparison. I can be wrong here, of course.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub (https://github.com/stan-dev/stan/issues/2735#issuecomment-461891867), or mute the thread (https://github.com/notifications/unsubscribe-auth/AfkhsMFtH8K1tuheWPEmrZdIG8x4gkJyks5vLbvfgaJpZM4awOS9).
if you can write it to work with banded and dense solvers... then go for it!
@yizhang-cae thanks for all the input.
@wds15 I'm not clear on the added benefit for PK/PD applications, especially since we already have between patients parallelization. But Yi has a point: many ODEs can be thought of as a large ODE system. I also think solving large systems is required PBPK and systems pharmacology models. Maybe these are hopeless with HMC, but approachable with approximate methods.
Also, my motivating problem here is finding the mode for a Laplace approximation, and here we can have up to thousands of equations. But yeah, this is orthogonal to pharmacometrics.
@charlesm93 For multi-modal pdfs you'll need solve multiple approximations, is that something that can also be parallelized? (not necessarily at the linear solver level).
@charlesm93 Approximation methods like the Laplace approach you are working on would be hugely useful if shown to work fast & reliable (even if they only work for a sub-class of models).
Formulating things as a large system (be it ODEs or whatever) could speed up things given you manage to express the sparseness of the system, I think... and that's the bummer. Expressing the sparseness is not trivial at all.
Moving this issue to math, see https://github.com/stan-dev/math/issues/1115.
Note: accidently created this issue first before creating the math issue. The first comment below are relevant to the issue on the math repo.
Summary:
Expose the function described in math#1115.
Description:
This can at first be done by "overloading" the signature for
algebra_solver
and splitting it in two signature:algebra_solver_powell()
andalgebra_solver_newton()
. We will then need a more sophisticated API to allow users to parallelize.Reproducible Steps:
See the issues posted on the forum.
Expected Output:
A faster algebraic solver applicable to large systems.
Additional Information:
NA
Current Version:
v2.18.1