symforce-org / symforce

Fast symbolic computation, code generation, and nonlinear optimization for robotics
https://symforce.org
Apache License 2.0
1.4k stars 143 forks source link

Poor accuracy versus Ceres in VSLAM #398

Open empty-spacebar opened 1 month ago

empty-spacebar commented 1 month ago

Hi,

I replaced Ceres with Symforce in a graph-based VSLAM backend, result in a bad accuracy. Here are the details:

Screenshots 1 2 image

Environment

Additional context

  1. Mostly Ceres could converge within 10 times itertations, while Symforce can hardly converge within 50 times.
  2. Drastic drift happened when use the sym::Optimizer parameters which offical examples use(as the screenshots above shows).
  3. I have already tested the validity of the values and factor graph built for the problem.
aaron-skydio commented 1 month ago

It's hard to say much here without knowing a lot more about what you're doing (what do you mean when you say graph-based VSLAM, what factors are included, what initialization, etc).

It would also be useful to know what optimization parameters are being used with Ceres, or if Ceres is selecting any of those automatically, which it's choosing (which optimizer, which linear solver, inner iterations, etc).

It looks like you've tried the DYNAMIC lambda parameter, which is one thing that can improve convergence in more nonlinear problems like bundle adjustment.

One other thing that would be useful to know is whether the SymForce optimizer converges eventually, and how many iterations it takes, if you increase the iteration limit.

empty-spacebar commented 1 month ago

It's hard to say much here without knowing a lot more about what you're doing (what do you mean when you say graph-based VSLAM, what factors are included, what initialization, etc).

The graph-based VSLAM framwork is VINS-MONO, feature-based frontend, factor-graph backend optimized with Gauss–Newton originally, and it optimizes camera poses, velocity, IMU bias, inverse-depth of landmark points.

It would also be useful to know what optimization parameters are being used with Ceres, or if Ceres is selecting any of those automatically, which it's choosing (which optimizer, which linear solver, inner iterations, etc).

Here is the Ceres parameter that works well in VINS:

options.max_num_iterations = 8
options.linear_solver_type = ceres::DENSE_SCHUR;
options.trust_region_strategy_type = ceres::DOGLEG

Symforce doesn't have DOGLEG, so I replace ceres::DOGLEG with LM in symforce like this:

using DenseOptimizer =
    sym::Optimizer<Scalar, sym::LevenbergMarquardtSolver<Scalar, sym::DenseCholeskySolver<Scalar>>>; 

I also tried the default sparse decomposition: sym::Optimizer<double> optimizer None of them runs well. Maybe the contrast between Ceres and Symforce comes from the performance difference between LM and dogleg? So here comes the question----Will Symforce support more iterative optimization algorithm like dogleg in the future?

It looks like you've tried the DYNAMIC lambda parameter, which is one thing that can improve convergence in more nonlinear problems like bundle adjustment.

DYNAMIC lambda parameter doesn't work----with it VINS runs even worse. Thus static lambda is used as the 2nd image above.

One other thing that would be useful to know is whether the SymForce optimizer converges eventually, and how many iterations it takes, if you increase the iteration limit.

As for whether the SymForce optimizer converges eventually, it's showed in the fourth image above----seldom converges, usually disconverges and ends up with hitting iteration limit(50 times iterations).

aaron-skydio commented 1 month ago

Again I'm not sure how much we're going to be able to debug this over GitHub issues, so this may be a dead end, but a couple more thoughts:

Here is the Ceres parameter that works well in VINS

Is Ceres using inner iterations? It might help to have the Ceres solver output, e.g. ceres::Solver::Summary::FullReport(). It might be useful to run Ceres with LM and no inner iterations, which would be closer to the SymForce optimizer.

DYNAMIC lambda parameter doesn't work----with it VINS runs even worse. Thus static lambda is used as the 2nd image above.

I'm not too shocked by this, but I'll note that DYNAMIC is basically what Ceres does by default in their LM implementation, while STATIC is not

As for whether the SymForce optimizer converges eventually, it's showed in the fourth image above----seldom converges, usually disconverges and ends up with hitting iteration limit(50 times iterations).

What I was asking is, if you set the SymForce iteration limit much higher, what happens? It's much more useful to know what SymForce would do with an unlimited number of iterations (does it converge in 60? 100? 1000? or get stuck?) (does optimizing to convergence give the same solution as Ceres?) than knowing that it often takes more than 50.

It is often also useful to see the full verbose optimization trace (by setting the verbose parameter to true), to know whether the optimizer is making good progress, and if not why not. Separately, it can be useful to visualize the values at intermediate steps of the optimization (you can save the Values for each iteration in the OptimizationStats by turning on debug_stats, and you seem to have some visualization tools set up already that you could probably use for this)

empty-spacebar commented 1 month ago

Is Ceres using inner iterations? It might help to have the Ceres solver output, e.g. `ceres::Solver::Summary::FullReport()

I didn't set the option 'use_inner_iterations', and since its default value is false, I guess inner iterations is OFF in my cases. I pack all my test log up, I hope the file names are clear enough to know where they came from. Also I took the screenshots of the trajectories of every test. symforce_test_20240726.zip

What I was asking is, if you set the SymForce iteration limit much higher, what happens? It's much more useful to know what SymForce would do with an unlimited number of iterations (does it converge in 60? 100? 1000? or get stuck?) (does optimizing to convergence give the same solution as Ceres?) than knowing that it often takes more than 50.

Log of this test is also being record into the file above. It's interesting that when Ceres runs well, it doesn't converge either(mostly in 4 times iterations, end because iteration time out). The reason why symforce runs worse is because its residual decreases VERY SLOWLY, sometimes takes 50 times iteration to get a good result(see sym_LM_dense_cholesky.png), sometimes just crush.

And more interesting is, Ceres doesn't run well everytimes(see Ceres_LM_dense_cholesky1~3, which means exactly the same parameter but different results)

Here are some other discoveries: 1, Ceres runs awfully with LM, no matter dense schur or dense cholesky is used; 2, Ceres with LM and sparse cholesky, reports error like 'dense matrix decomposition fail' but still get right trajectory; 3, symforce with LM and sparse cholesky runs bad, with dense cholesky and 50 times iterations runs good. 4, When set lambda zero, the program report:'Internal Error: Damped hessian factorization failed.'

aaron-skydio commented 1 month ago

That's actually pretty helpful, thanks

Optimization traces like this are exactly those that should be improved by dynamic lambda:

[2024-07-26 14:11:26.634] [info] LM<sym::Optimize> [iter    0] lambda: 1.000e+00, error prev/linear/new: 2.010e+02/2.009e+02/2.010e+02, rel reduction: 2.45473e-04, gain ratio: 5.14165e-01
[2024-07-26 14:11:26.636] [info] LM<sym::Optimize> [iter    1] lambda: 1.000e-01, error prev/linear/new: 2.010e+02/2.008e+02/2.011e+02, rel reduction: -4.78170e-04, gain ratio: -6.30568e-01
[2024-07-26 14:11:26.639] [info] LM<sym::Optimize> [iter    2] lambda: 1.000e+00, error prev/linear/new: 2.010e+02/2.009e+02/2.009e+02, rel reduction: 1.02077e-04, gain ratio: 3.61970e-01
[2024-07-26 14:11:26.641] [info] LM<sym::Optimize> [iter    3] lambda: 1.000e-01, error prev/linear/new: 2.009e+02/2.008e+02/2.011e+02, rel reduction: -6.08975e-04, gain ratio: -1.06610e+00
[2024-07-26 14:11:26.643] [info] LM<sym::Optimize> [iter    4] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 4.45069e-05, gain ratio: 2.27182e-01
[2024-07-26 14:11:26.645] [info] LM<sym::Optimize> [iter    5] lambda: 1.000e-01, error prev/linear/new: 2.009e+02/2.008e+02/2.011e+02, rel reduction: -6.62745e-04, gain ratio: -1.44154e+00
[2024-07-26 14:11:26.647] [info] LM<sym::Optimize> [iter    6] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.60077e-05, gain ratio: 1.08418e-01
[2024-07-26 14:11:26.650] [info] LM<sym::Optimize> [iter    7] lambda: 1.000e-01, error prev/linear/new: 2.009e+02/2.008e+02/2.011e+02, rel reduction: -6.79496e-04, gain ratio: -1.75608e+00
[2024-07-26 14:11:26.652] [info] LM<sym::Optimize> [iter    8] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.89187e-06, gain ratio: 1.61017e-02
[2024-07-26 14:11:26.654] [info] LM<sym::Optimize> [iter    9] lambda: 1.000e-01, error prev/linear/new: 2.009e+02/2.009e+02/2.011e+02, rel reduction: -6.77233e-04, gain ratio: -2.01352e+00
[2024-07-26 14:11:26.656] [info] LM<sym::Optimize> [iter   10] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -4.29621e-06, gain ratio: -4.41172e-02
[2024-07-26 14:11:26.659] [info] LM<sym::Optimize> [iter   11] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.68324e-05, gain ratio: 1.27198e+00
[2024-07-26 14:11:26.661] [info] LM<sym::Optimize> [iter   12] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -4.27360e-06, gain ratio: -4.47309e-02
[2024-07-26 14:11:26.664] [info] LM<sym::Optimize> [iter   13] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.65771e-05, gain ratio: 1.28204e+00
[2024-07-26 14:11:26.666] [info] LM<sym::Optimize> [iter   14] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -4.21685e-06, gain ratio: -4.49676e-02
[2024-07-26 14:11:26.668] [info] LM<sym::Optimize> [iter   15] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.63717e-05, gain ratio: 1.29433e+00
[2024-07-26 14:11:26.671] [info] LM<sym::Optimize> [iter   16] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -4.13596e-06, gain ratio: -4.49176e-02
[2024-07-26 14:11:26.673] [info] LM<sym::Optimize> [iter   17] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.62052e-05, gain ratio: 1.30846e+00
[2024-07-26 14:11:26.676] [info] LM<sym::Optimize> [iter   18] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -4.03752e-06, gain ratio: -4.46405e-02
[2024-07-26 14:11:26.678] [info] LM<sym::Optimize> [iter   19] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.60693e-05, gain ratio: 1.32411e+00
[2024-07-26 14:11:26.681] [info] LM<sym::Optimize> [iter   20] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.92577e-06, gain ratio: -4.41744e-02
[2024-07-26 14:11:26.683] [info] LM<sym::Optimize> [iter   21] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.59577e-05, gain ratio: 1.34101e+00
[2024-07-26 14:11:26.685] [info] LM<sym::Optimize> [iter   22] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.80335e-06, gain ratio: -4.35422e-02
[2024-07-26 14:11:26.687] [info] LM<sym::Optimize> [iter   23] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.58654e-05, gain ratio: 1.35893e+00
[2024-07-26 14:11:26.689] [info] LM<sym::Optimize> [iter   24] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.67189e-06, gain ratio: -4.27562e-02
[2024-07-26 14:11:26.691] [info] LM<sym::Optimize> [iter   25] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.57885e-05, gain ratio: 1.37768e+00
[2024-07-26 14:11:26.693] [info] LM<sym::Optimize> [iter   26] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.53228e-06, gain ratio: -4.18221e-02
[2024-07-26 14:11:26.696] [info] LM<sym::Optimize> [iter   27] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.57240e-05, gain ratio: 1.39712e+00
[2024-07-26 14:11:26.698] [info] LM<sym::Optimize> [iter   28] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.38502e-06, gain ratio: -4.07408e-02
[2024-07-26 14:11:26.700] [info] LM<sym::Optimize> [iter   29] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.56696e-05, gain ratio: 1.41712e+00
[2024-07-26 14:11:26.702] [info] LM<sym::Optimize> [iter   30] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.23032e-06, gain ratio: -3.95103e-02
[2024-07-26 14:11:26.705] [info] LM<sym::Optimize> [iter   31] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.56235e-05, gain ratio: 1.43758e+00
[2024-07-26 14:11:26.707] [info] LM<sym::Optimize> [iter   32] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -3.06825e-06, gain ratio: -3.81271e-02
[2024-07-26 14:11:26.709] [info] LM<sym::Optimize> [iter   33] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.55842e-05, gain ratio: 1.45842e+00
[2024-07-26 14:11:26.712] [info] LM<sym::Optimize> [iter   34] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -2.89880e-06, gain ratio: -3.65868e-02
[2024-07-26 14:11:26.714] [info] LM<sym::Optimize> [iter   35] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.55506e-05, gain ratio: 1.47957e+00
[2024-07-26 14:11:26.717] [info] LM<sym::Optimize> [iter   36] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -2.72195e-06, gain ratio: -3.48846e-02
[2024-07-26 14:11:26.721] [info] LM<sym::Optimize> [iter   37] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.55218e-05, gain ratio: 1.50097e+00
[2024-07-26 14:11:26.724] [info] LM<sym::Optimize> [iter   38] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -2.53767e-06, gain ratio: -3.30160e-02
[2024-07-26 14:11:26.726] [info] LM<sym::Optimize> [iter   39] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.54970e-05, gain ratio: 1.52259e+00
[2024-07-26 14:11:26.729] [info] LM<sym::Optimize> [iter   40] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -2.34597e-06, gain ratio: -3.09769e-02
[2024-07-26 14:11:26.732] [info] LM<sym::Optimize> [iter   41] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.54756e-05, gain ratio: 1.54438e+00
[2024-07-26 14:11:26.734] [info] LM<sym::Optimize> [iter   42] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -2.14689e-06, gain ratio: -2.87635e-02
[2024-07-26 14:11:26.737] [info] LM<sym::Optimize> [iter   43] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.54573e-05, gain ratio: 1.56632e+00
[2024-07-26 14:11:26.739] [info] LM<sym::Optimize> [iter   44] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -1.94050e-06, gain ratio: -2.63729e-02
[2024-07-26 14:11:26.742] [info] LM<sym::Optimize> [iter   45] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.54416e-05, gain ratio: 1.58837e+00
[2024-07-26 14:11:26.745] [info] LM<sym::Optimize> [iter   46] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -1.72693e-06, gain ratio: -2.38027e-02
[2024-07-26 14:11:26.747] [info] LM<sym::Optimize> [iter   47] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.54282e-05, gain ratio: 1.61053e+00
[2024-07-26 14:11:26.750] [info] LM<sym::Optimize> [iter   48] lambda: 1.000e+00, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: -1.50632e-06, gain ratio: -2.10511e-02
[2024-07-26 14:11:26.752] [info] LM<sym::Optimize> [iter   49] lambda: 1.000e+01, error prev/linear/new: 2.009e+02/2.009e+02/2.009e+02, rel reduction: 1.54169e-05, gain ratio: 1.63278e+00
[2024-07-26 14:11:26.752] [warning] LM<sym::Optimize> Optimization finished with status: HIT_ITERATION_LIMIT

Traces like this one are a little weird, but I think may be resolved by using a smaller min lambda - the gain ratio is consistently high (>>1), which is a bit weird but arguably good (the optimizer is making much more progress than it expects based on the current linearization). This maybe suggests the optimizer is making good progress even though the problem is highly nonlinear. Anyway, I'd try decreasing the min lambda.

[2024-07-26 14:11:26.768] [info] LM<sym::Optimize> [iter    0] lambda: 1.000e+00, error prev/linear/new: 2.310e+02/2.309e+02/2.309e+02, rel reduction: 1.74157e-04, gain ratio: 1.19171e+00
[2024-07-26 14:11:26.770] [info] LM<sym::Optimize> [iter    1] lambda: 1.000e-01, error prev/linear/new: 2.309e+02/2.309e+02/2.308e+02, rel reduction: 4.50963e-04, gain ratio: 1.44795e+00
[2024-07-26 14:11:26.773] [info] LM<sym::Optimize> [iter    2] lambda: 1.000e-02, error prev/linear/new: 2.308e+02/2.307e+02/2.308e+02, rel reduction: 2.02472e-04, gain ratio: 4.42505e-01
[2024-07-26 14:11:26.776] [info] LM<sym::Optimize> [iter    3] lambda: 1.000e-03, error prev/linear/new: 2.308e+02/2.307e+02/2.307e+02, rel reduction: 2.31655e-04, gain ratio: 7.88559e-01
[2024-07-26 14:11:26.778] [info] LM<sym::Optimize> [iter    4] lambda: 1.000e-04, error prev/linear/new: 2.307e+02/2.307e+02/2.306e+02, rel reduction: 5.22702e-04, gain ratio: 8.66064e+00
[2024-07-26 14:11:26.782] [info] LM<sym::Optimize> [iter    5] lambda: 1.000e-05, error prev/linear/new: 2.306e+02/2.306e+02/2.305e+02, rel reduction: 6.61930e-04, gain ratio: 2.43387e+01
[2024-07-26 14:11:26.785] [info] LM<sym::Optimize> [iter    6] lambda: 1.000e-06, error prev/linear/new: 2.305e+02/2.305e+02/2.303e+02, rel reduction: 6.44922e-04, gain ratio: 3.96905e+01
[2024-07-26 14:11:26.788] [info] LM<sym::Optimize> [iter    7] lambda: 1.000e-07, error prev/linear/new: 2.303e+02/2.303e+02/2.302e+02, rel reduction: 6.61305e-04, gain ratio: 1.66814e+01
[2024-07-26 14:11:26.790] [info] LM<sym::Optimize> [iter    8] lambda: 1.000e-08, error prev/linear/new: 2.302e+02/2.301e+02/2.300e+02, rel reduction: 8.40862e-04, gain ratio: 5.89807e+00
[2024-07-26 14:11:26.792] [info] LM<sym::Optimize> [iter    9] lambda: 1.000e-08, error prev/linear/new: 2.300e+02/2.300e+02/2.298e+02, rel reduction: 8.02504e-04, gain ratio: 1.42988e+01
[2024-07-26 14:11:26.795] [info] LM<sym::Optimize> [iter   10] lambda: 1.000e-08, error prev/linear/new: 2.298e+02/2.298e+02/2.296e+02, rel reduction: 8.01375e-04, gain ratio: 3.11727e+01
[2024-07-26 14:11:26.797] [info] LM<sym::Optimize> [iter   11] lambda: 1.000e-08, error prev/linear/new: 2.296e+02/2.296e+02/2.294e+02, rel reduction: 8.08292e-04, gain ratio: 5.40277e+01
[2024-07-26 14:11:26.800] [info] LM<sym::Optimize> [iter   12] lambda: 1.000e-08, error prev/linear/new: 2.294e+02/2.294e+02/2.292e+02, rel reduction: 8.15082e-04, gain ratio: 7.31191e+01
[2024-07-26 14:11:26.802] [info] LM<sym::Optimize> [iter   13] lambda: 1.000e-08, error prev/linear/new: 2.292e+02/2.292e+02/2.290e+02, rel reduction: 8.20049e-04, gain ratio: 8.38013e+01
[2024-07-26 14:11:26.804] [info] LM<sym::Optimize> [iter   14] lambda: 1.000e-08, error prev/linear/new: 2.290e+02/2.290e+02/2.288e+02, rel reduction: 8.23287e-04, gain ratio: 8.85802e+01
[2024-07-26 14:11:26.807] [info] LM<sym::Optimize> [iter   15] lambda: 1.000e-08, error prev/linear/new: 2.288e+02/2.288e+02/2.287e+02, rel reduction: 8.25248e-04, gain ratio: 9.05509e+01
[2024-07-26 14:11:26.809] [info] LM<sym::Optimize> [iter   16] lambda: 1.000e-08, error prev/linear/new: 2.287e+02/2.287e+02/2.285e+02, rel reduction: 8.26344e-04, gain ratio: 9.13594e+01
[2024-07-26 14:11:26.812] [info] LM<sym::Optimize> [iter   17] lambda: 1.000e-08, error prev/linear/new: 2.285e+02/2.285e+02/2.283e+02, rel reduction: 8.26879e-04, gain ratio: 9.16996e+01
[2024-07-26 14:11:26.814] [info] LM<sym::Optimize> [iter   18] lambda: 1.000e-08, error prev/linear/new: 2.283e+02/2.283e+02/2.281e+02, rel reduction: 8.27056e-04, gain ratio: 9.18459e+01
[2024-07-26 14:11:26.816] [info] LM<sym::Optimize> [iter   19] lambda: 1.000e-08, error prev/linear/new: 2.281e+02/2.281e+02/2.279e+02, rel reduction: 8.27005e-04, gain ratio: 9.19070e+01
[2024-07-26 14:11:26.819] [info] LM<sym::Optimize> [iter   20] lambda: 1.000e-08, error prev/linear/new: 2.279e+02/2.279e+02/2.277e+02, rel reduction: 8.26812e-04, gain ratio: 9.19280e+01
[2024-07-26 14:11:26.823] [info] LM<sym::Optimize> [iter   21] lambda: 1.000e-08, error prev/linear/new: 2.277e+02/2.277e+02/2.275e+02, rel reduction: 8.26528e-04, gain ratio: 9.19285e+01
[2024-07-26 14:11:26.825] [info] LM<sym::Optimize> [iter   22] lambda: 1.000e-08, error prev/linear/new: 2.275e+02/2.275e+02/2.273e+02, rel reduction: 8.26187e-04, gain ratio: 9.19178e+01
[2024-07-26 14:11:26.828] [info] LM<sym::Optimize> [iter   23] lambda: 1.000e-08, error prev/linear/new: 2.273e+02/2.273e+02/2.272e+02, rel reduction: 8.25811e-04, gain ratio: 9.19008e+01
[2024-07-26 14:11:26.830] [info] LM<sym::Optimize> [iter   24] lambda: 1.000e-08, error prev/linear/new: 2.272e+02/2.271e+02/2.270e+02, rel reduction: 8.25412e-04, gain ratio: 9.18800e+01
[2024-07-26 14:11:26.833] [info] LM<sym::Optimize> [iter   25] lambda: 1.000e-08, error prev/linear/new: 2.270e+02/2.270e+02/2.268e+02, rel reduction: 8.24998e-04, gain ratio: 9.18569e+01
[2024-07-26 14:11:26.835] [info] LM<sym::Optimize> [iter   26] lambda: 1.000e-08, error prev/linear/new: 2.268e+02/2.268e+02/2.266e+02, rel reduction: 8.24576e-04, gain ratio: 9.18325e+01
[2024-07-26 14:11:26.838] [info] LM<sym::Optimize> [iter   27] lambda: 1.000e-08, error prev/linear/new: 2.266e+02/2.266e+02/2.264e+02, rel reduction: 8.24149e-04, gain ratio: 9.18072e+01
[2024-07-26 14:11:26.840] [info] LM<sym::Optimize> [iter   28] lambda: 1.000e-08, error prev/linear/new: 2.264e+02/2.264e+02/2.262e+02, rel reduction: 8.23719e-04, gain ratio: 9.17814e+01
[2024-07-26 14:11:26.843] [info] LM<sym::Optimize> [iter   29] lambda: 1.000e-08, error prev/linear/new: 2.262e+02/2.262e+02/2.260e+02, rel reduction: 8.23287e-04, gain ratio: 9.17554e+01
[2024-07-26 14:11:26.845] [info] LM<sym::Optimize> [iter   30] lambda: 1.000e-08, error prev/linear/new: 2.260e+02/2.260e+02/2.258e+02, rel reduction: 8.22854e-04, gain ratio: 9.17293e+01
[2024-07-26 14:11:26.848] [info] LM<sym::Optimize> [iter   31] lambda: 1.000e-08, error prev/linear/new: 2.258e+02/2.258e+02/2.257e+02, rel reduction: 8.22422e-04, gain ratio: 9.17031e+01
[2024-07-26 14:11:26.850] [info] LM<sym::Optimize> [iter   32] lambda: 1.000e-08, error prev/linear/new: 2.257e+02/2.257e+02/2.255e+02, rel reduction: 8.21989e-04, gain ratio: 9.16769e+01
[2024-07-26 14:11:26.852] [info] LM<sym::Optimize> [iter   33] lambda: 1.000e-08, error prev/linear/new: 2.255e+02/2.255e+02/2.253e+02, rel reduction: 8.21558e-04, gain ratio: 9.16508e+01
[2024-07-26 14:11:26.855] [info] LM<sym::Optimize> [iter   34] lambda: 1.000e-08, error prev/linear/new: 2.253e+02/2.253e+02/2.251e+02, rel reduction: 8.21127e-04, gain ratio: 9.16248e+01
[2024-07-26 14:11:26.858] [info] LM<sym::Optimize> [iter   35] lambda: 1.000e-08, error prev/linear/new: 2.251e+02/2.251e+02/2.249e+02, rel reduction: 8.20697e-04, gain ratio: 9.15989e+01
[2024-07-26 14:11:26.860] [info] LM<sym::Optimize> [iter   36] lambda: 1.000e-08, error prev/linear/new: 2.249e+02/2.249e+02/2.247e+02, rel reduction: 8.20268e-04, gain ratio: 9.15732e+01
[2024-07-26 14:11:26.862] [info] LM<sym::Optimize> [iter   37] lambda: 1.000e-08, error prev/linear/new: 2.247e+02/2.247e+02/2.245e+02, rel reduction: 8.19841e-04, gain ratio: 9.15475e+01
[2024-07-26 14:11:26.865] [info] LM<sym::Optimize> [iter   38] lambda: 1.000e-08, error prev/linear/new: 2.245e+02/2.245e+02/2.244e+02, rel reduction: 8.19414e-04, gain ratio: 9.15220e+01
[2024-07-26 14:11:26.867] [info] LM<sym::Optimize> [iter   39] lambda: 1.000e-08, error prev/linear/new: 2.244e+02/2.244e+02/2.242e+02, rel reduction: 8.18989e-04, gain ratio: 9.14966e+01
[2024-07-26 14:11:26.870] [info] LM<sym::Optimize> [iter   40] lambda: 1.000e-08, error prev/linear/new: 2.242e+02/2.242e+02/2.240e+02, rel reduction: 8.18565e-04, gain ratio: 9.14713e+01
[2024-07-26 14:11:26.873] [info] LM<sym::Optimize> [iter   41] lambda: 1.000e-08, error prev/linear/new: 2.240e+02/2.240e+02/2.238e+02, rel reduction: 8.18142e-04, gain ratio: 9.14462e+01
[2024-07-26 14:11:26.875] [info] LM<sym::Optimize> [iter   42] lambda: 1.000e-08, error prev/linear/new: 2.238e+02/2.238e+02/2.236e+02, rel reduction: 8.17720e-04, gain ratio: 9.14212e+01
[2024-07-26 14:11:26.877] [info] LM<sym::Optimize> [iter   43] lambda: 1.000e-08, error prev/linear/new: 2.236e+02/2.236e+02/2.234e+02, rel reduction: 8.17299e-04, gain ratio: 9.13964e+01
[2024-07-26 14:11:26.880] [info] LM<sym::Optimize> [iter   44] lambda: 1.000e-08, error prev/linear/new: 2.234e+02/2.234e+02/2.233e+02, rel reduction: 8.16880e-04, gain ratio: 9.13717e+01
[2024-07-26 14:11:26.882] [info] LM<sym::Optimize> [iter   45] lambda: 1.000e-08, error prev/linear/new: 2.233e+02/2.233e+02/2.231e+02, rel reduction: 8.16462e-04, gain ratio: 9.13471e+01
[2024-07-26 14:11:26.884] [info] LM<sym::Optimize> [iter   46] lambda: 1.000e-08, error prev/linear/new: 2.231e+02/2.231e+02/2.229e+02, rel reduction: 8.16045e-04, gain ratio: 9.13227e+01
[2024-07-26 14:11:26.887] [info] LM<sym::Optimize> [iter   47] lambda: 1.000e-08, error prev/linear/new: 2.229e+02/2.229e+02/2.227e+02, rel reduction: 8.15630e-04, gain ratio: 9.12984e+01
[2024-07-26 14:11:26.890] [info] LM<sym::Optimize> [iter   48] lambda: 1.000e-08, error prev/linear/new: 2.227e+02/2.227e+02/2.225e+02, rel reduction: 8.15216e-04, gain ratio: 9.12743e+01
[2024-07-26 14:11:26.892] [info] LM<sym::Optimize> [iter   49] lambda: 1.000e-08, error prev/linear/new: 2.225e+02/2.225e+02/2.224e+02, rel reduction: 8.14803e-04, gain ratio: 9.12504e+01
[2024-07-26 14:11:26.892] [warning] LM<sym::Optimize> Optimization finished with status: HIT_ITERATION_LIMIT

The combination of 1) the fact that you mentioned this used to use GN, and presumably worked ok there and 2) lots of traces like that second one, where lambda monotonically decreases to a very small value also make me think your initial lambda should maybe be much smaller than 1, e.g. 1e-8, which would also make SymForce converge faster. This may also be a reason why you saw dynamic lambda doing worse, because one thing it doesn't do is decrease lambda as quickly.

Your points 2 and 4 indicate that the problem is extremely ill-conditioned, and the hessian is either singular or has a very high condition number.

3 might make sense given this - the dense cholesky solver SymForce uses is Eigen's LDLT, which has pivoting, which may make it more stable in the face of the hessian being extremely ill-conditioned.

When you say this is VINS-MONO, is it the default implementation https://github.com/HKUST-Aerial-Robotics/VINS-Mono, does it have modifications, is it another implementation, etc? Partly I'm wondering whether your initializations are already very good, and the optimizations you're doing here are making things much worse than just the initialization, and what it would look like if the optimization just did nothing

empty-spacebar commented 1 month ago

Here's my further test result, all the test is runing on the euroc MH_05_difficult dataset, using Symforce with DYNAMIC lambda parameter. 0727.zip

To facilitate your understanding, I make a table: image

And here's some new conclusion:

  1. The value of initial_lambda doesn't make many difference;
  2. Min lambda in 1e-10 is reasonable, the smaller the worse, and too small(like 1e-16) lead to system crush;
  3. It's interesting that I got the fastest runtime performance when I use sparse mode(see the yellow line in the table), it's also the first time symforce runs in real-time, however it still iterates too many times, mostly 50 times.

Partly I'm wondering whether your initializations are already very good

Both Ceres and Symforce use the same initilization result, I didn't modify the initiation part of VINS-MONO.

empty-spacebar commented 4 weeks ago

If you need , I can provide modified VINS-mono Keep me updated.

aaron-skydio commented 4 weeks ago

To be honest, it's unlikely that I'm going to have time to look into this soon. Assuming you're talking about a fork of VINS-Mono, running the way they describe in their README for EuRoC on MH_05_difficult with nothing else involved, yeah it seems like a link to the code should be enough to reproduce, that would be helpful if I or someone else has time to look at some point

empty-spacebar commented 4 weeks ago

Leave your email or any other contact information plz, or you can contact me by [zhangda12138@gmail.com],then I could send you the code.

aaron-skydio commented 4 weeks ago

It's going to be much easier if you make a fork of VINS-Mono, and link to your fork here. If for some reason you don't want your code to be publicly accessible, you can create a private repo and invite my GitHub account.

empty-spacebar commented 3 weeks ago

https://github.com/wxliu1/VINS-Mono-NotedByLWX Here's the code