symforce-org / symforce

Fast symbolic computation, code generation, and nonlinear optimization for robotics
https://symforce.org
Apache License 2.0
1.41k stars 145 forks source link

Unreasonable high covariances #371

Closed hflemmen closed 11 months ago

hflemmen commented 11 months ago

The Optimizer.compute_all_covariances seems to be giving unreasonably high covariances, even for relatively simple optimization problems.

I tried to create a nearly minimal example with a single factor and a single Pose2:

import symforce

symforce.set_epsilon_to_symbol()

from symforce.values import Values
import symforce.symbolic as sf
from symforce.opt.factor import Factor
from symforce.opt.optimizer import Optimizer

def prior_res(T_w_0: sf.Pose2, pose: sf.Pose2, epsilon: sf.Scalar) -> sf.V1:
    res = sf.V3(sf.wrap_angle(T_w_0.R.to_tangent(epsilon)[0] - pose.R.to_tangent(epsilon)[0]),
                T_w_0.t.x - pose.t.x, T_w_0.t.y - pose.t.y)
    return sf.V1(res.norm())

def main():
    values = Values(
        poses=[sf.Pose2(sf.Rot2.identity(), sf.V2(0.1, 0))],
        priors=[sf.Pose2(sf.Rot2.from_angle(0.1), sf.V2(-1, 0.5))],
        epsilon=sf.numeric_epsilon,
    )
    factors = []
    factors.append(Factor(
        residual=prior_res,
        keys=["poses[0]", "priors[0]", "epsilon"],
    ))
    optimized_keys = ["poses[0]"]
    optimizer = Optimizer(
        factors=factors,
        optimized_keys=optimized_keys,
        debug_stats=True,
        params=Optimizer.Params(verbose=True, iterations=100),
    )

    result = optimizer.optimize(values, populate_best_linearization=True)

    print(f"Num iterations: {len(result.iterations) - 1}")
    print(f"Final error: {result.error():.6f}")
    print(f"Status: {result.status}")
    optimized_pose = result.optimized_values["poses[0]"]
    print(f"Pose: t = {optimized_pose.position()}, heading = {optimized_pose.rotation().to_tangent()[0]}")
    cov = optimizer.compute_all_covariances(result.optimized_values)

    print(f"Covariance: {cov['poses[0]']}")

if __name__ == "__main__":
    main()

This produces the output:

[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    0] lambda: 1.000e+00, error prev/linear/new: 0.735/0.000/0.184, rel reduction: 0.75000
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    1] lambda: 2.500e-01, error prev/linear/new: 0.184/0.000/0.007, rel reduction: 0.96000
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    2] lambda: 6.250e-02, error prev/linear/new: 0.007/0.000/0.000, rel reduction: 0.99654
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    3] lambda: 1.562e-02, error prev/linear/new: 0.000/0.000/0.000, rel reduction: 0.99976
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    4] lambda: 3.906e-03, error prev/linear/new: 0.000/0.000/0.000, rel reduction: 0.99998
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    5] lambda: 9.766e-04, error prev/linear/new: 0.000/0.000/0.000, rel reduction: 0.96462
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    6] lambda: 2.441e-04, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -30.83974
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    7] lambda: 9.766e-04, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -26.87693
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    8] lambda: 3.906e-03, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -16.74719
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter    9] lambda: 1.562e-02, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -4.86417
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter   10] lambda: 6.250e-02, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -0.55206
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter   11] lambda: 2.500e-01, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -0.02417
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter   12] lambda: 1.000e+00, error prev/linear/new: 0.000/0.000/0.000, rel reduction: 0.00341
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter   13] lambda: 2.500e-01, error prev/linear/new: 0.000/0.000/0.000, rel reduction: -0.00000
[2023-10-24 13:44:23.531] [info] LM<sym::Optimize> [iter   14] lambda: 1.000e+00, error prev/linear/new: 0.000/0.000/0.000, rel reduction: 0.00000
Num iterations: 15
Final error: 0.000000
Status: optimization_status_t.SUCCESS
Pose: t = [-1.   0.5], heading = 0.09999999850249877
Covariance: [[4.50359963e+14 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 4.50359963e+14 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 4.50359963e+14]]

As you see, the covariance is in the order of magnitude 10^13 or 10^14, which seems unreasonably high.

This happens with other more complex factors I have implemented, but not with the examples in the documentation. This implies that it might be the prior_res residual that is the problem, but I don't see what the problem might be.

Environment:

aaron-skydio commented 11 months ago

I would recommend using a residual like:

def prior_res(T_w_0: sf.Pose2, pose: sf.Pose2, epsilon: sf.Scalar) -> sf.V3:
    res = sf.V3(sf.wrap_angle(T_w_0.R.to_tangent(epsilon)[0] - pose.R.to_tangent(epsilon)[0]),
                T_w_0.t.x - pose.t.x, T_w_0.t.y - pose.t.y)
    return res

Unless you have a specific reason to use the norm as the residual instead. You could maybe also just use symforce.codegen.geo_factors_codegen.prior_factor?

The residual you have with the norm has the same optimum as the suggested residual, but the shape of the cost function is different. In particular as a 1dof residual you'll never get a proper covariance, since the Gauss-Newton hessian J^T J will be rank 1. Levenberg-Marquardt as an optimization algorithm is pretty robust, so it does seem to converge with either version, but it should also converge in fewer iterations with the recommended residual.

hflemmen commented 11 months ago

Thank you! I think this is what I was looking for.

I don't have a specific reason to use the norm in the residual. In fact, I would like to use a norm similar to res.T * sigma * res, where sigma is a diagonal information matrix, but I simplified it for the example. Now it seems to me that I can use the symforce.opt.noise_models.DiagonalNoiseModel, whiten() and just return the whitened residual as a V3 to do what I want.

Perhaps a dumb question from someone who has not understood Levenber-Marquardt properly: Why do your bearing_residual and odometry_residual from the example on the readme give reasonable covariances? Both of them are 1 dof residuals?

aaron-skydio commented 11 months ago

I don't have a specific reason to use the norm in the residual. In fact, I would like to use a norm similar to res.T sigma res, where sigma is a diagonal information matrix, but I simplified it for the example. Now it seems to me that I can use the symforce.opt.noise_models.DiagonalNoiseModel, whiten() and just return the whitened residual as a V3 to do what I want.

Yep exactly, the way to do get a cost like res.T * A * res is to use a residual like sqrt(A) * res, which the DiagonalNoiseModel will do automatically inside

Perhaps a dumb question from someone who has not understood Levenber-Marquardt properly: Why do your bearing_residual and odometry_residual from the example on the readme give reasonable covariances? Both of them are 1 dof residuals?

Not a dumb question at all :). The bearing_residual and odometry_residual in the readme are fundamentally 1dof measurements, it's fundamentally impossible to solve for a Pose2 given just one bearing_residual or odometry_residual measurement (in particular for the odometry residual, we're pretending you have a sensor that just measures your distance traveled, and not the full relative pose). If you did set up a problem with less than three total of bearing_residual and/or odometry_residual per pose, you'd get an infinite (or very large) covariance. But, if you have enough of them to fully constrain the result, you'll get a proper covariance.

hflemmen commented 11 months ago

That makes sense, but I am still a little confused. I think my confusion is about the difference between my residual and the one you recommended. The first returns a scalar, but the second returns a vector. Still, I imagine that even in the latter case, they are combined to a single cost(probably by summing their squares?), which again would make them equivalent? I.e. why does it matter if they are combined within the residual function, or by the optimizer itself?

Less than three bearing residuals/odometry residuals per pose would make it impossible to estimate a 3 dof pose, but in my toy example, the problem is well constrained. We have the (x,y,theta) from the prior to constrain the (x,y,theta) we try to estimate. I would consider these two to be two different cases, one where the problem is sufficiently constrained, within a single residual and one where we need several residuals to constrain it.

aaron-skydio commented 11 months ago

That makes sense, but I am still a little confused. I think my confusion is about the difference between my residual and the one you recommended. The first returns a scalar, but the second returns a vector. Still, I imagine that even in the latter case, they are combined to a single cost(probably by summing their squares?), which again would make them equivalent? I.e. why does it matter if they are combined within the residual function, or by the optimizer itself?

It matters because of the approximation Gauss-Newton (and Levenberg-Marquardt) makes for the Hessian of the problem. The costs you get from the two formulations are essentially the same, because yes the cost optimized by Gauss-Newton is the sum of squares of the residuals. Gauss-Newton doesn't use the exact Hessian of this cost though, it uses the approximation described here, which does give different results for the two formulations. In particular, having only 1 residual necessarily means that the Gauss-Newton approximation of the Hessian will be rank 1, even though taking the true hessian of the sum of squares would give something that's full rank. And, the covariances are derived from the Gauss-Newton hessian, not the true hessian.

hflemmen commented 11 months ago

Thank you for the explanation! It makes sense now.