loco-3d / crocoddyl

Crocoddyl is an optimal control library for robot control under contact sequence. Its solver is based on various efficient Differential Dynamic Programming (DDP)-like algorithms
BSD 3-Clause "New" or "Revised" License
858 stars 174 forks source link

Optimize the [F]DDP Line Search for MPC #1104

Closed andreadelprete closed 1 year ago

andreadelprete commented 2 years ago

Hi everybody,

@GianniLunardi and I are trying to reduce the computation time of DDP (or FDDP) for an MPC application. Thanks to the parallelization of the backward pass, now the bottleneck is the forward pass, which takes up to 75% of the time in some tests. Looking at the implementation of the forward pass I was therefore wondering about whether we could improve a few things.

First, since in case the expected improvement is negative (i.e. dVexp_ < 0) the step is not accepted (regardless of the value of dV), I don't understand why in those cases we should compute dV with the tryStep function. Isn't that a waste of time? We could first compute dVexp, check whether it's positive, and only in those cases compute dV.

Second, the fact that dVexp_ < 0 means that the local approximation of the Q function is concave. However, this doesn't necessarily mean the step is not gonna improve the cost. So I wonder whether there could be something smarter we could do with the computed step in case dVexp_ < 0. Maybe simply checking whether dV > 0?

Third, the current API of the solvers does not allow us to change the values of alpha used in the line search. Is there a reason for that? How did people who did MPC with crocoddyl handle this? Testing 10 values of alpha can be computationally demanding and blow up the computation time of your MPC iteration. Ideally we should have an "adaptive line search" that tests as many values of alpha as possible in the remaining time of the current MPC iteration.

Finally, can't we parallelize the forward pass computing the tryStep method for multiple values of alpha in parallel?

cmastalli commented 2 years ago

Hi everybody,

@GianniLunardi and I are trying to reduce the computation time of DDP (or FDDP) for an MPC application. Thanks to the parallelization of the backward pass, now the bottleneck is the forward pass, which takes up to 75% of the time in some tests. Looking at the implementation of the forward pass I was therefore wondering about whether we could improve a few things.

Hi @andreadelprete

Do you refer to the parallelisation of computing the derivatives? (This is not the same to parallelise the backward pass). If you are talking about the backward pass, then it might be good to arrange a meeting. In the last 5 months, we have done some progress in paralleling both backward and forward passes. This is still an ongoing research activity.

First, since in case the expected improvement is negative (i.e. dVexp_ < 0) the step is not accepted (regardless of the value of dV), I don't understand why in those cases we should compute dV with the tryStep function. Isn't that a waste of time? We could first compute dVexp, check whether it's positive, and only in those cases compute dV.

Yes, we could run tryStep depending on the result of dVexp. I guess we haven't paid so much attention to improving DDP as everyone is using FDDP or Box-FDDP. Feel free to make a PR about it.

Second, the fact that dVexp_ < 0 means that the local approximation of the Q function is concave. However, this doesn't necessarily mean the step is not gonna improve the cost. So I wonder whether there could be something smarter we could do with the computed step in case dVexp_ < 0. Maybe simply checking whether dV > 0?

I guess you are still talking about the DDP and not FDDP, right? In any case, I am not aware of a smarter strategy. I am happy to read any paper that proposes something better. But, this criteria is already good. I personally spent some decent time evaluating them, including in cases with constraints.

Third, the current API of the solvers does not allow us to change the values of alpha used in the line search. Is there a reason for that? How did people who did MPC with crocoddyl handle this? Testing 10 values of alpha can be computationally demanding and blow up the computation time of your MPC iteration. Ideally we should have an "adaptive line search" that tests as many values of alpha as possible in the remaining time of the current MPC iteration.

The devel branch allows you to change alpha, I don't remember about the master one. Are you sure you cannot change alpha?

On the other hand, we have implemented a simple backtracking procedure. It is indeed worth investing time in finding a better strategy. Do you have any paper that suggests a better alternative that is still cheat to compute?

Finally, can't we parallelize the forward pass computing the tryStep method for multiple values of alpha in parallel?

Yes, we can, but as I said below we have been working on a potentially better idea. I could tell you more in a face-to-face meeting, if you're interested in the topic.

andreadelprete commented 2 years ago

Hi @cmastalli

thanks for the fast reply!

Do you refer to the parallelisation of computing the derivatives?

Yes, sorry I took that for granted.

Yes, we could run tryStep depending on the result of dVexp. I guess we haven't paid so much attention to improving DDP as everyone is using FDDP or Box-FDDP. Feel free to make a PR about it.

Got it

Second, the fact that dVexp_ < 0 means that the local approximation of the Q function is concave. However, this doesn't necessarily mean the step is not gonna improve the cost. So I wonder whether there could be something smarter we could do with the computed step in case dVexp_ < 0. Maybe simply checking whether dV > 0?

I guess you are still talking about the DDP and not FDDP, right? In any case, I am not aware of a smarter strategy. I am happy to read any paper that proposes something better. But, this criteria is already good. I personally spent some decent time evaluating them, including in cases with constraints.

Yes, on this the FDDP does something different. I am not aware of any state-of-the-art alternative, I was just wondering whether somebody knew one.

The devel branch allows you to change alpha, I don't remember about the master one. Are you sure you cannot change alpha?

You're right, my bad, you can actually set the list of alpha values even on master.

On the other hand, we have implemented a simple backtracking procedure. It is indeed worth investing time in finding a better strategy. Do you have any paper that suggests a better alternative that is still cheat to compute?

No, but I have this simple idea that seems better than nothing. We should have an "adaptive line search" that tests as many values of alpha as possible in the remaining time of the current MPC iteration. I guess we could do that easily by dynamically changing the list of alpha values after measuring how much time the backward pass has taken.

Finally, can't we parallelize the forward pass computing the tryStep method for multiple values of alpha in parallel?

Yes, we can, but as I said below we have been working on a potentially better idea. I could tell you more in a face-to-face meeting, if you're interested in the topic.

Sure, I'll contact you by email then.

cmastalli commented 2 years ago

I know there are some ideas in the numerical-optimisation literature that design the next step length based on performance of previous ones. The whole idea is to get the most 'descending" result, as a bigger step is not necessary the optimal one.

Another possibility is to investigate other conditions, instead of the naive Armijo one.

andreadelprete commented 2 years ago

Coming back to the forward pass of FDDP, I don't understand the logic of the code for the case dVexp<0. The comment says "reducing the gaps by allowing a small increment in the cost value", which makes sense, but if that's the objective, then a cost increment should only be allowed if is_feasible==false. Or am I missing something? Currently we are accepting a cost increase even if the gaps are already zero.

cmastalli commented 2 years ago

Coming back to the forward pass of FDDP, I don't understand the logic of the code for the case dVexp<0. The comment says "reducing the gaps by allowing a small increment in the cost value", which makes sense, but if that's the objective, then a cost increment should only be allowed if is_feasible==false. Or am I missing something? Currently we are accepting a cost increase even if the gaps are already zero.

(if my understanding is correct) dVexp is always positive (i.e., descend direction) when the gaps are zero; it boils down to the DDP approach. And this statement comes from the fact that we are using a Cholesky decomposition equipped with a regularisation scheme. Therefore, it won't happen the following sentence "we are accepting a cost increase even if the gaps are already zero". Perhaps, we could double-check this empirically.

However, I agree that we might want to include another test in that if-loop (i.e., is_feasible==false).

Let me do a couple of extra notes that could be useful here:

  1. We evaluate dVexp > 0 in DDP as we could have an infeasible warm start.
  2. dVext can have a negative value when there is a gap, and it depends on the rollout itself (see Eq. 19 in https://arxiv.org/pdf/1909.04947.pdf)
andreadelprete commented 2 years ago

Coming back to the forward pass of FDDP, I don't understand the logic of the code for the case dVexp<0. The comment says "reducing the gaps by allowing a small increment in the cost value", which makes sense, but if that's the objective, then a cost increment should only be allowed if is_feasible==false. Or am I missing something? Currently we are accepting a cost increase even if the gaps are already zero.

(if my understanding is correct) dVexp is always positive (i.e., descend direction) when the gaps are zero; it boils down to the DDP approach. And this statement comes from the fact that we are using a Cholesky decomposition equipped with a regularisation scheme. Therefore, it won't happen the following sentence "we are accepting a cost increase even if the gaps are already zero".

Ok, so what you are saying is that, in case Quu is not positive-definite, even after adding the regularization, then the Cholesky decomposition would fail and the backward pass would be repeated with a larger regularization. Therefore, once we get to the forward pass we can be sure that Quu_inverse is positive-definite, and therefore dVexp>0. Is my understanding correct?

1. We evaluate `dVexp > 0` in DDP as we could have an infeasible warm start.

Good to know, I didn't know this.

cmastalli commented 2 years ago

Ok, so what you are saying is that, in case Quu is not positive-definite, even after adding the regularization, then the Cholesky decomposition would fail and the backward pass would be repeated with a larger regularization. Therefore, once we get to the forward pass we can be sure that Quu_inverse is positive-definite, and therefore dVexp>0. Is my understanding correct?

YES!

andreadelprete commented 2 years ago

Ok, so to summarize, the possible actions to take now are the following:

  1. Improve the forward pass of the DDP solver to avoid the computation of dV in case dVexp<0
  2. Implement a new forward pass for MPC application, where rather than testing values of alpha until a given condition is satisfied, we test all the user-specified values of alpha and take the one who gave the best result. The number of values of alpha could be automatically adjusted by the solver itself based on how much time is left in the current MPC iteration after computing the backward pass.

Would you agree that the second point is interesting for MPC applications? This could actually be done for all solvers (DDP, FDDP, boxDDP, ...).

cmastalli commented 2 years ago

Ok, so to summarize, the possible actions to take now are the following:

  1. Improve the forward pass of the DDP solver to avoid the computation of dV in case dVexp<0
  2. Implement a new forward pass for MPC application, where rather than testing values of alpha until a given condition is satisfied, we test all the user-specified values of alpha and take the one who gave the best result. The number of values of alpha could be automatically adjusted by the solver itself based on how much time is left in the current MPC iteration after computing the backward pass.

Would you agree that the second point is interesting for MPC applications? This could actually be done for all solvers (DDP, FDDP, boxDDP, ...).

I agree! I am happy if you (or someone in your team) want to take the lead. I would suggest creating a repo with the Python code of the solver. I am also happy to share our equality-constrained DDP solver written in Python. I think this last point would help to try these developments in a wider range of problems, e.g., with inverse dynamics.

Note that there is still a number of modifications to be merged, so I would suggest validating them in Python and writing in c++ when it is convenient. If this sounds good to you, then we could arrange a meeting to discuss more it.

andreadelprete commented 2 years ago

Actually this seems a rather straightforward extension of the current solvers, so I was thinking of implementing it directly in C++. My idea would be to create a child class for each solver (e.g., DDP -> DDP_MPC, FDDP -> FDDP_MPC), where we just reimplement the forward pass according to the logic described above. Could that work, or are you afraid it would clash with the ongoing modifications that haven't been merged yet?

cmastalli commented 2 years ago

Actually this seems a rather straightforward extension of the current solvers, so I was thinking of implementing it directly in C++. My idea would be to create a child class for each solver (e.g., DDP -> DDP_MPC, FDDP -> FDDP_MPC), where we just reimplement the forward pass according to the logic described above. Could that work, or are you afraid it would clash with the ongoing modifications that haven't been merged yet?

Yes, it is but it is not often straightforward to analyse the behaviour in c++. But I will let you work as you prefer.

It could create conflicts, but if you modify only a few lines (as expected) then it is manageable. The only thing is that I would like to see results in our unmerged equality-constrained solver. But, you could cherry-pick your commit into this branch: https://github.com/cmastalli/crocoddyl/tree/topic/preview-2-rebasing for further tests.

cmastalli commented 2 years ago

Btw, please do not create a child class for each solver. You can start working from this branch: #1107. If you finish before merging this PR, then you have two options: (1) wait or (2) create a PR to my branch.

cmastalli commented 1 year ago

Ok, so to summarize, the possible actions to take now are the following:

  1. Improve the forward pass of the DDP solver to avoid the computation of dV in case dVexp<0
  2. Implement a new forward pass for MPC application, where rather than testing values of alpha until a given condition is satisfied, we test all the user-specified values of alpha and take the one who gave the best result. The number of values of alpha could be automatically adjusted by the solver itself based on how much time is left in the current MPC iteration after computing the backward pass.

Would you agree that the second point is interesting for MPC applications? This could actually be done for all solvers (DDP, FDDP, boxDDP, ...).

I have implemented many improvements in the step acceptance, regularization and rollout in a private branch, which will be released later. In any case, point 1 is not applicable anymore. In short, the idea of not computing dV when dVexp < 0 is wrong. When dealing with constraints, we need to balance between cost improvement (optimality) and constraint satisfaction (feasibility). The suggested approach would interfere with this.

The second point still remains valid despite that I DID implement a new approach for the forward pass. But, @andreadelprete you shouldn't work on this as I am working on a major improvement of solvers. There are many features we're pushing forward, and this will take us some time. I'll handle this.

Please accept closing this issue, as there is no need for further discussions or input from others. Thanks!