loco-3d / crocoddyl

Crocoddyl is an optimal control library for robot control under contact sequence. Its solver is based on various efficient Differential Dynamic Programming (DDP)-like algorithms
BSD 3-Clause "New" or "Revised" License
808 stars 166 forks source link

Error in computation of Value function's Hessian? #925

Closed andreadelprete closed 3 years ago

andreadelprete commented 3 years ago

Looking at the code in solver DDP it seems that the value function's Hessian is computed as:

Vxx = Qxx - Qxu*K 

As far as I know, this formula is correct when the regularization (on the control and the state) is null. However, when it's not null, this formula should be used:

Vxx = Qxx - Qxu*K - K^T*Qux + K^T*Quu*K

We can easily check that the two formulas are equivalent in the case of zero regularization by substituting K = Quu^-1 * Qux. This was explained in [1], in the paragraph above equation (11).

[1] Tassa, Y., Erez, T., & Todorov, E. (2012). Synthesis and stabilization of complex behaviors through online trajectory optimization. In Intelligent Robots and Systems (IROS), IEEE/RSJ International Conference on (pp. 4906–4913)

wxmerkt commented 3 years ago

Related previous merge request on gepgitlab following an email conversation based on the same observation: https://gepgitlab.laas.fr/loco-3d/crocoddyl/-/merge_requests/241

cmastalli commented 3 years ago

In Crocoddyl, we are regularizing all the Hessian terms (not only the diagonals one, i.e., Qxx and Quu). If we do so, the longer expression can be condensed with the shorter one. Additionally, Tassa equation for updating the Jacobian and Hessian of the Value function seems to be wrong, as it uses the nonregularization terms for Qxx, Quu and Qux. It is like comparing apples with pears! Thus, from that perspective, it seems to be everything alright.

On the other hand, our regularization strategy is not strictly the one employed in nonlinear programs (e.g., KNITRO's interior-point algorithms). We are doing two dirty things that I understood a few months ago, but I did not have time to look on it.

First, we combine Vxx and Quu regularizations. Again, comparing apples with pears as described below

Thus, the current regularization in Crocoddyl is [Qxx + mu*fx^T*fx; Qxu + mu*fx^T*fu] [ Qux + mu*fu^T*fx; Quu + mu*(I +fu^T*fu)]

However, nonlinear program often regularizes the diagonal terms only, as it is locally convex if the previous matrix is SDF, i.e., [Qxx + mu*I; Qxu] [ Qux; Quu + mu*I] or [Qxx + mu*fx^T*fx; Qxu ] [ Qux; Quu + mu*fu^T*fu ]

Note that, in both regularization approaches, large mu values cancel out the effect of the Hessian, so we are in the steepest descent direction! This is the behavior that we wish, as we are implementing a LM approach.

In conclusion, I believe we should spend time analyzing these approaches for regularization and removing the dirty regularization approach (if this is supported by the analysis).

andreadelprete commented 3 years ago

After think a bit about it I came to the conclusions that there are two ways to approach regularization in DDP.

Approach 1: Tassa

We can see regularization as a change in the local minimization problem that we solver at each iteration of the backward pass to compute k and K. Using this approach, the update of the Value function should be performed by substituting in the unregularized Q function the values of k and K computed with regularization. This is what Tassa does in eq. (11) of his paper, and I don't see this as a mistake.

Approach 2

We can see regularization as a change in the quadratic approximation of the nonlinear trajectory optimization problem. This boils down to adding a diagonal matrix to the Hessian of the cost (l_xx, l_uu), and it's equivalent to regularizing Q_xx and Q_uu. According to this approach, the update of the Value function should use the regularized Q function. Since the regularized Q terms are used both for the update of the Value function and for the computation of K and k, in this case the short expression for the Value function update can be used.

Conclusions

As far as I understand, @cmastalli is suggesting to use approach 2. If this is the case, the only change needed in the code is to stop using the longer expression for the update of the gradient of the Value function here when the regularization is not null.

PS: I think approach 2 is basically what's already implemented in Crocoddyl and it's equivalent to the regularization of the Hessian of the cost that would be performed by a standard NLP solver. In your previous post @cmastalli you're analyzing the regularization of an NLP solver to the Q function, but that's misleading because an NLP solver wouldn't see the Q function, it would see just the cost and the constraints. The Q function is a by-product of the DDP algorithm.

cmastalli commented 3 years ago

Regularising Qxx and Quu, as described by myself above, is equivalent to regularising the KKT linear system of equations computed by applying the Newton method (as typically done in NLP algorithms). It is correct what I have written!

Let me try to explain this fact. The node-wise KKT equation is KKT_matrix * [[dx]; [du]; [dlmb]; [dx']] = KKT_vector where KKT_matrix = [ [lxx lxu fx^T 0]; [lux luu fu^T 0]; [fx fu 0 -I]; [0 0 -I vxx']] KKT_vector = - [[lx]; [lu]; [0]; [vx']]

From the above equation, we can observe that the current node depends on the next Value function (i.e., its Jacobian and Hessian). Furthermore, if we condense this equation we obtain the well known expression, i.e., [[Qxx Qxu]; [Qux Quu]] * [[dx]; [du]] = [[-Qx]; [-Qu]] which again computes the Newton step as done in NLP algorithms. It resembles a conventional NLP algorithm!!!

The important bit to observe here is that regularising Qxx and Quu is equivalent to regularise lxx and luu, as I implicitly stated in my previous comment.

Furthermore, we can also observe that vxx' and vx' condense the information of the next node-wise KKT equation. So, if we regularise the entire OC problem, it also means that vxx' has to be computed with the regularised Qxx and Quu.

Therefore, the Tassa's approach is not correct and neither supported by conventional methods in numerical optimisation. Additionally, my previous post is not misleading, as it's equivalent to what NLP algorithms do.

On the other hand, in Crocoddyl, we are currently closer to the right approach, but as described above, we are regularising additional terms such as Qxu and having different regularisation values for Qxx and Quu.

Let me know if my comment is clear ;)

andreadelprete commented 3 years ago

It is correct what I have written!

Actually I did not say that it was not...

Furthermore, we can also observe that vxx' and vx' condense the information of the next node-wise KKT equation. So, if we regularise the entire OC problem, it also means that vxx' has to be computed with the regularised Qxx and Quu.

Yes, I agree, and this is what I've called "approach 2".

Therefore, the Tassa's approach is not correct and neither supported by conventional methods in numerical optimisation.

I think Tassa's approach is correct as well, it's just regularizing a different problem. Instead of regularizing the trajectory optimization problem, Tassa is regularizing the point-wise quadratic minimization that you solve at each point in time during the backward pass.

Additionally, my previous post is not misleading, as it's equivalent to what NLP algorithms do.

Probably it's a complex subject that we should discuss in person rather than in an issue.

On the other hand, in Crocoddyl, we are currently closer to the right approach, but as described above, we are regularising additional terms such as Qxu and having different regularisation values for Qxx and Quu.

I think the main change needed in Crocoddyl is to stop using the longer expression for the update of the gradient of the Value function when the regularization is not null (even though it should give the same results). Moreover, if you wanna use "approach 2", we should remove the direct regularization of Quu. Note however that approach 1 and 2 could be also used together, which is kind-of what crocoddyl is currently doing.

cmastalli commented 3 years ago

@andreadelprete this is indeed a hard topic to discuss here.

It would be useful to discuss it in person. Perhaps, it would be also useful to prepare a short report with the math.

cmastalli commented 3 years ago

I think the main change needed in Crocoddyl is to stop using the longer expression for the update of the gradient of the Value function when the regularization is not null (even though it should give the same results).

I missed to comment this sentence. I agree with you, despite that I also think that it should lead to the same result. What it is needed is to regularize Qxx (as we do with Quu). See my previous comment:

However, nonlinear program often regularizes the diagonal terms only, as it is locally convex if the previous matrix is SDF, i.e., [Qxx + muI; Qxu] [Qux; Quu + muI]

and note that we are currently doing: [Qxx; Qxu] [ Qux; Quu + mu*I]

where * means with the Vxx regularization.

Actually, the Vxx regularization can be interpreted as banded regularization. It regularizes the entire banded block of the KKT matrix. However, I am not aware of the numerical implications of doing so, e.g., does it handle potentially liner dependencies in the dynamic Jacobians (despite that it will be weird to have a system with liner dependencies)?

cmastalli commented 3 years ago

@andreadelprete -- I believe we could close this issue after merging #974. Feel free to re-open it if you disagree.