Sensor model constraints with initial guess from current graph optimization

Hi,

We are fusing sensor data from odometry, IMU and laser data scan-to-scan and scan-to-map using fuse with the fixed lag smoother optimizer and noticed an issue for our scan-to-scan and scan-to-map sensor models/constraints. Especially our used scan-to-map algorithm needs a good initial guess and therefore we wrote our own ceres cost function to take the guess from the optimization loop. At our own cost function we take at the overridden function Evaluate the (first) incoming guess, compute the scan-to-map pose estimate, cache it for subsequent calls, and then calculate the residuals and jacobians based on the pose error and covariance of the pose estimate.

This approach works most times fine but at some strange situations we noticed that our localization is moving either just shortly for a few samples but sometimes even permanent although the robot is at standstill. We found then out that the initial guess at the Evaluate function of the cost function is sometimes uninitialized and exactly zero which in some situations causes the above described behavior.

So our questions are now:

How would you set an initial guess (for a scan-to-map algorithm) when at the callback when getting the sensor data (in this case laser pointcloud) the initial guess is not known? At best we would like to incorporate also the other new sensor data (in this case odometry and IMU) in the initial guess such that the initial guess is as good as possible.
If there is no good option for the first question, what would you think of extending the fixed lag smoother optimizer with a variant of stepwise optimization with configurable stages with for example first optimize using only odometry and IMU and then optimize using all sensors? If you think this is a good idea without any bad side effects, we could also try to implement this in a PR ourselves.

If you have any further questions, don't hesitate asking.

Thanks, Fabian

You could take the most recent pose in the graph (T0), then use all the odometry and imu measurements that came after that pose to estimate the relative change from it. You could do that by directly using fuse's hash graph to add whatever odometry and imu constraints between T0 and your pose at the desired timestamp (T1) to do a standalone optimisation in your sensor model. The initial guess for T1 could be obtained from the odometry? Then optimizing with the imu constraints as well gives you the fused initial estimate

"We found then out that the initial guess at the Evaluate function of the cost function is sometimes uninitialized and exactly zero" That sounds like a bug. By the time the Ceres cost functions are evaluated, it is intended for all variables to have been initialized with valid initial guesses. Any information you can share with me about the conditions when this happens would be appreciated. Things like the set of variable IDs and values in the order they were added to the graph would be helpful.
Generating initial guesses is difficult, and I don't have a great general-purpose solution for that. For your specific problem, I see two very different cases. The first case involves the starting condition, or global localization. How does your system handle the very first pose? Is the scan-to-map algorithm expected to search the entire map? If so, then perhaps your scan-to-map sensor should be an "ignition" sensor. If not, then perhaps the scan-to-map sensor only starts running after it receives the first "notify()" call containing the robot's pose. The second case involves how the robot's pose is updated/tracked as the optimizer runs. The scan-to-map sensor could register a callback for the graph updates from the optimizer. Every time an optimization completes, the updated graph will be sent to all the sensors. The scan-to-map sensor could extract the most recent optimized state, and then predict the robot's position at the next laserscan timestamp using a kinematic model.
Running multiple optimization phases per "update" would be expensive, but I'm not opposed to it if you find it useful.

Hello @svwilliams.

Thank you for your great feedback!

I will start with point 1 and 2 and last evaluate point 0 because this takes a bit longer:

Our scan-to-map algorithm cannot search the entire map so so we have another "ignition" sensor that takes care of that. So indeed it only starts running after the first "notify()" call.
- The steps for our scan-to-map model are as follows:
  1. A new laserscan comes in and we create a new transaction containing two variables (position+orientation) which are not initialized with any value, the transaction stamp is the laserscan stamp, the transaction "involvedStamps" is again just the laserscan stamp, and a custom constraint with a custom evaluation function overriding ceres::SizedCostFunction::Evaluate
    - Our expectation was that fuse takes care about the motion model prediction and then uses the value of that in the evaluation of the constraint cost function
  2. When the custom evaluation function is called for the first time, it takes the function parameter parameters which actually contains the value of the position and orientation at that first optimization step (so no optimization was done yet and everything is directly from the graph before optimization), takes it as pose guess, performs the scan-to-map algorithm with that, cache the result, and then do the comparison between the optimizer's incoming pose and the pose computation from the scan-to-map algorithm.
  3. For any subsequent calls of the custom evaluation function, to save computing time, the scan-to-map pose computation does not need to be done again but takes the cached value from before (a small difference in the estimate from the optimizer should not make any difference).
    - There is a similar procedure for the scan-to-scan algorithm but in this case with two poses (each with position + orientation).
    - Taking the most recent optimized state and having a separate kinematic model is an option (so no uninitialized values in step 1) but our idea is that an optimization (and therefore also pose estimation in this case) with other sensors too (odometry + IMU in our case) is more accurate than just using the last optimized state and predicting it with a separate kinematic model. Also, I think it is nicer to use one kinematic model predication variant (that of fuse) instead of having another separate predication in our sensor models that fuse is not aware of.
We found it useful to have multiple optimization phases to treat sensors differently. In our case we would first optimize everything with just the motion model (and the ignition if existing), then relative constraints based on a direct measurement (odometry + IMU), then relative constraints using indirect measurements (scan-to-scan) and last absolute constraints based on an indirect measurement. By that someone can also assign "priorities" to sensor models. For our precise (but unfortunately not so robust) scan-to-map algorithm we would then also already have a much better estimation than just using motion prediction but also already odometry + IMU and scan-to-scan.

Now last to point 0:

After your answer I got the feeling that this is not at all intended and therefore today looked again more closer to give you more insights. Attached you can find five files describing a situation where an uninitialized variable occurs:

graph_6400000000 gv is the graph visualization before the optimization with the error
graph_6500000000 gv is the graph visualization after the optimization with the error
graph_6400000000.txt is a yaml file of the whole graph after the optimization including the added transaction before the optimization. All of that is from the notify() function after the optimization. The timestamp is from the time before the optimization with the error.
graph_6500000000.txt is a yaml file of the whole graph after the optimization including the added transaction after the optimization with the error. The timestamp is from the time after the optimization with the error.
log_extraction.log is a log extraction of some new debug outputs that I just made before (the debug outputs are not part of any pushed code to my repository)

You will notice that in graph_6500000000.txt there are two new added variables in the transaction, ecaa765f-54f9-504a-87aa-286d4f1012c4 (line 246) and fe745f27-6e2e-59d4-806c-57c8a7b9a99c (line 236). They are part of our ScanToScanConstraint, the Unicycle2DStateKinematicConstraint, and ScanToMapConstraint. Both are not initialized although we would have expected that they are getting initialized by the motion model of the unicycle_motion_model. They are also new and the latest stamp in the graph. This is the behavior that I meant with uninitialized. That same value is then of course used in our pose estimation outlined above in step 2. In log_extraction.log we see some more debug outputs of fuse_core/timestamp_manager.cpp::query() (line 62 at the current devel branch). I then found very interesting that in line 9 of the log the newly added motion model transaction is printed (the transaction that is merged here). In line 68 of the log you see that the variable ecaa765f-54f9-504a-87aa-286d4f1012c4 is added with a nice initialized value. Interestingly at line 283 the same variable is again here but this time not initialized. It also makes sense for me to not create any new kinematic constraint as it is just one timestamp within the transaction. I then saw that the second time is caused by having a second pending transaction with just the transaction outlined in line 269 in the log which causes a second time to apply the motion model (see here). For me this feels now that we should better first merge all pending transactions and then apply the motion model such that nothing gets overwritten by a later uninitialized variable.

However, by thinking about this problem again I found out that I can get around this issue by adding to the scan-to-map model the latest graph pose timestamp to the involvedStamps. By using that, the pending transaction of scan-to-map also gets added motion model prediction containing from the last graph pose to the new pose from the scan-to-map timestamp and is then nicely initialized. It then still overwrites the same variable from (in the case above) scan-to-scan with its value but as all motion model kinematics from the same timestamp are cached, this is no problem.

The final question for me is now: Is the change in our implementation then just a workaround or do you see this as an intended function? (which means this is no bug)

I believe the issue you are experiencing is a bug in the TimestampManager. The intent of this block is to update the variables in the sensor transaction to match the motion model, even for existing motion model segments. https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L116-L135

In your example, a scan-to-scan constraint is first added between 6.4s and 6.5s. Since 6.5s is a new timestamp, this causes the timestamp manager to generate a new motion model constraint between the last known time (6.4s) and the new time in the transaction (6.5s).

I would expect augmented_stamps to contain two entries [6.4, 6.5] https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L82-L100

And stamp_pairs to contain a single entry [ {6.4, 6.5} ] https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L102-L147

And I would expect this block to be skipped because the 6.4->6.5 motion mode segment does not exist yet. https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L111-L136

And the variables in the transaction are updated to match the motion model here: https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L171

From your logs, that appears to be working as expected.

And then a second constraint is processed, this time containing a scan-to-map constraint at 6.5s.

After rereading the code carefully, I would expect augmented_stamps to contain one entry [6.5] https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L82-L100

Because there is only one entry, no timestamp pairs are created here: https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L104-L106

And thus, the variables in the sensor constraint will never execute the "update variable" code here: https://github.com/locusrobotics/fuse/blob/devel/fuse_core/src/timestamp_manager.cpp#L116-L134 Since that code exists inside the timestamp pair processing. This is the bug.

When you modified your code to add a second timestamp to the scan-to-map Transaction, you allowed timestamp pairs to be created, and thus allowed the current code to update the transaction variables.

I'll put together a unit test to recreate the scenario and verify it fails. Then I'll work on updating the logic to handle this case correctly. Thanks for providing the debugging logs; that was extremely helpful. I'll let you know when I have something ready to test.

I've created two unit tests for the TimestampManager to recreate your example: https://github.com/locusrobotics/fuse/blob/issue-300-prior-on-last-stamp/fuse_core/test/test_timestamp_manager.cpp#L868-L975

I have confirmed that when creating a transaction on the single most recent variable, that variable does not get updated with the value from the motion model.

But when adding a transaction involving two or more variables, all variable values do get updated to match the motion model like we would expect.

This is definitely a bug in the TimestampManager. It may take me a few days, but I'll work out a fix and let you know when it is ready.

I have a potential fix available for review: https://github.com/locusrobotics/fuse/pull/323

To the best of my knowledge, this should fix the missing initial conditions issues you described. But I'm unclear whether this resolves the original issue of needing to stage sets of transactions to be added to the graph in a specific order. If you believe that is still needed, we can work on PR for the step-wise optimizer.

locusrobotics / fuse

Sensor model constraints with initial guess from current graph optimization #300

Now last to point 0: