Solving a batch of trajectory optimization problems on a GPU

andreadelprete commented 7 months ago

Hi,

I am looking for a library to solve batches of trajectory optimization problems (same problem, different initial states) on accelerators, and I've found this library, which looks great! I have some experience with JAX, so this library would be perfect for me, but I am struggling to understand how much it was designed for my use case. As far as I know, GPUs are not suitable for matrix decompositions, and I saw that your iLQR solver relies on solving linear systems. Would you agree that this could be a bottleneck for my use case? And out of curiosity: what is the use case for which this library was designed?

larsrpe commented 7 months ago

Hi, I can't answer any Trajax related questions, but I think we are looking for the same thing. I have yet to find a good way on how to solve "batched" TOs, however this publication looks promising, leveraging the a new CUDA enabled spare linear system solver: https://discourse.julialang.org/t/examodels-jl-and-madnlp-jl-on-gpus/111377.

In their work they use GPU to accellerate large scale sparse optimal control. One approach would be to calculate all objectives and constraints in parallell on a GPU and the "concatenating" everything into one large, and very sparse TO.

Let me know if you find anything interesting!

andreadelprete commented 7 months ago

Thanks for the feedback @larsrpe . However I think that concatenating everything into a large sparse TO would be suboptimal in terms of performance. Moreover, the performance gains reported at that link are not as good as I was hoping for, which makes sense because solving a single TO problem on GPU is much more challenging that solving a batch of TOs on GPU in my opinion. I'll let you know if I find anything.

larsrpe commented 7 months ago

I agree with your insight. I think one option would be to use this batched QP solver https://github.com/locuslab/qpth combined with a batched line search and Hessian Update. Basically a batched SQP.

andreadelprete commented 7 months ago

Yes, that indeed could be an option. Alternatively, one could modify their QP solver to become a nonlinear solver. Since it's based on a Primal-Dual Interior Point algorithm, I'd say it shouldn't be too difficult.

larsrpe commented 7 months ago

I agree. Maybe I will find some time to look into it!

ssingh19 commented 6 months ago

Hello @andreadelprete. What kind of problems are you solving? If unconstrained (i.e., dynamics constraints only), the iLQR solve call can be wrapped with jax's vmap function. This will give you parallelization within a single device. As to how the matrix factorizations in the TV-LQR step are "parallelized," this will be largely abstracted away by the XLA code.

If your problem is more complex (has state-control constraints that you don't want to treat as cost penalties), you can use the SQP solver in experimental. However, the outer-most part of this solver needs a slight restructuring to enable wrapping with vmap (I have to remove some non-jittable parts like verbose printing, etc).

andreadelprete commented 6 months ago

I am interested in both unconstrained and constrained problems, so I could start experimenting with unconstrained ones using iLQR with vmap. Thanks for the feedback @ssingh19 ! Do you mind if we leave this issue open to discuss future developments for this use case?

larsrpe commented 6 months ago

Cool! Can I ask what kind of OC problems you are solving @andreadelprete ?

andreadelprete commented 6 months ago

Cool! Can I ask what kind of OC problems you are solving @andreadelprete ?

Sure, you can find some examples of the unconstrained problems in our paper "CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards global optimality", https://arxiv.org/abs/2211.06625

google / trajax

Solving a batch of trajectory optimization problems on a GPU #15