Closed hflemmen closed 11 months ago
I would recommend using a residual like:
def prior_res(T_w_0: sf.Pose2, pose: sf.Pose2, epsilon: sf.Scalar) -> sf.V3:
res = sf.V3(sf.wrap_angle(T_w_0.R.to_tangent(epsilon)[0] - pose.R.to_tangent(epsilon)[0]),
T_w_0.t.x - pose.t.x, T_w_0.t.y - pose.t.y)
return res
Unless you have a specific reason to use the norm as the residual instead. You could maybe also just use symforce.codegen.geo_factors_codegen.prior_factor
?
The residual you have with the norm has the same optimum as the suggested residual, but the shape of the cost function is different. In particular as a 1dof residual you'll never get a proper covariance, since the Gauss-Newton hessian J^T J
will be rank 1. Levenberg-Marquardt as an optimization algorithm is pretty robust, so it does seem to converge with either version, but it should also converge in fewer iterations with the recommended residual.
Thank you! I think this is what I was looking for.
I don't have a specific reason to use the norm in the residual. In fact, I would like to use a norm similar to res.T * sigma * res
, where sigma
is a diagonal information matrix, but I simplified it for the example. Now it seems to me that I can use the symforce.opt.noise_models.DiagonalNoiseModel
, whiten()
and just return the whitened residual as a V3
to do what I want.
Perhaps a dumb question from someone who has not understood Levenber-Marquardt properly: Why do your bearing_residual
and odometry_residual
from the example on the readme give reasonable covariances? Both of them are 1 dof residuals?
I don't have a specific reason to use the norm in the residual. In fact, I would like to use a norm similar to res.T sigma res, where sigma is a diagonal information matrix, but I simplified it for the example. Now it seems to me that I can use the symforce.opt.noise_models.DiagonalNoiseModel, whiten() and just return the whitened residual as a V3 to do what I want.
Yep exactly, the way to do get a cost like res.T * A * res
is to use a residual like sqrt(A) * res
, which the DiagonalNoiseModel
will do automatically inside
Perhaps a dumb question from someone who has not understood Levenber-Marquardt properly: Why do your bearing_residual and odometry_residual from the example on the readme give reasonable covariances? Both of them are 1 dof residuals?
Not a dumb question at all :). The bearing_residual
and odometry_residual
in the readme are fundamentally 1dof measurements, it's fundamentally impossible to solve for a Pose2 given just one bearing_residual
or odometry_residual
measurement (in particular for the odometry residual, we're pretending you have a sensor that just measures your distance traveled, and not the full relative pose). If you did set up a problem with less than three total of bearing_residual
and/or odometry_residual
per pose, you'd get an infinite (or very large) covariance. But, if you have enough of them to fully constrain the result, you'll get a proper covariance.
That makes sense, but I am still a little confused. I think my confusion is about the difference between my residual and the one you recommended. The first returns a scalar, but the second returns a vector. Still, I imagine that even in the latter case, they are combined to a single cost(probably by summing their squares?), which again would make them equivalent? I.e. why does it matter if they are combined within the residual function, or by the optimizer itself?
Less than three bearing residuals/odometry residuals per pose would make it impossible to estimate a 3 dof pose, but in my toy example, the problem is well constrained. We have the (x,y,theta) from the prior to constrain the (x,y,theta) we try to estimate. I would consider these two to be two different cases, one where the problem is sufficiently constrained, within a single residual and one where we need several residuals to constrain it.
That makes sense, but I am still a little confused. I think my confusion is about the difference between my residual and the one you recommended. The first returns a scalar, but the second returns a vector. Still, I imagine that even in the latter case, they are combined to a single cost(probably by summing their squares?), which again would make them equivalent? I.e. why does it matter if they are combined within the residual function, or by the optimizer itself?
It matters because of the approximation Gauss-Newton (and Levenberg-Marquardt) makes for the Hessian of the problem. The costs you get from the two formulations are essentially the same, because yes the cost optimized by Gauss-Newton is the sum of squares of the residuals. Gauss-Newton doesn't use the exact Hessian of this cost though, it uses the approximation described here, which does give different results for the two formulations. In particular, having only 1 residual necessarily means that the Gauss-Newton approximation of the Hessian will be rank 1, even though taking the true hessian of the sum of squares would give something that's full rank. And, the covariances are derived from the Gauss-Newton hessian, not the true hessian.
Thank you for the explanation! It makes sense now.
The
Optimizer.compute_all_covariances
seems to be giving unreasonably high covariances, even for relatively simple optimization problems.I tried to create a nearly minimal example with a single factor and a single
Pose2
:This produces the output:
As you see, the covariance is in the order of magnitude 10^13 or 10^14, which seems unreasonably high.
This happens with other more complex factors I have implemented, but not with the examples in the documentation. This implies that it might be the
prior_res
residual that is the problem, but I don't see what the problem might be.Environment: