Line Search in DDP/iLQR

hanyas commented 2 years ago

I've noticed that in addition to the Hessian regularization in DDP, you describe and implement a line-search for the scaling of the feed-forward term of the controller.

I am aware that Mayne (1967), Liao and Shoemaker (1992), and Tassa (2012) all implement such a line-search. However, it appears to me that, at least, Tassa (2012) decouples the backward-pass Hessian regularization from the feed-forward scaling. Your paper and implementation uses the same step to regularize the Hessian and scale the feed-forward term.

I am wondering what this scaling of the step corresponds to in general? My current understanding that Levenberg-Marquardt algorithms usually do not require additional scaling of the step after the direction is rotated, or am I mistaken here?

vroulet commented 2 years ago

Hello Hany,

In section 6 of the technical report "Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates" (most recent version available here), I describe two possible ways to think about the stepsize: either one considers the reciprocal of the regularization as a stepsize and so there is only one scaling parameter, or one uses a more classical approach which is to move along the direction computed by the oracle. In this second case, for the line-search to be meaningful, one needs to have computed a descent direction, as explained page 18 of the aforementioned report. In practice, in the code, on lines 49 to 63 and 98 to 110, I add a regularization to ensure that the stepsize can be a descent direction. So when using an algorithm with line-search on a descent direction, I have a similar system of two scaling parameters as the references you mention, i.e., a small regularization to ensure a descent direction and then a stepsize to see how much I can move along this direction.

For your questions, I have tried to explain the line-search implementations the best I could in the aforementioned report. Concerning the Levenberg-Marquardt method, the regularization can be chosen either as a constant or changed as the algorithm goes. Usually one would not want to change the regularization parameter because computing a direction a priori requires solving a linear system, which is computationally expansive. However, in nonlinear control, this system has a specific structure that makes it easily solvable by dynamic programming. Hence choosing to modify the regularization as the algorithm goes was meaningful here.

If you have any other questions, it may be easier to schedule a quick meeting.

All the best,

Vincent

hanyas commented 2 years ago

Thank you Vincent. I'll check the relevant parts of the report for more info. I'll drop you a line if have any questions, I appreciate it.

vroulet / ilqc

Line Search in DDP/iLQR #2