VRichardJP commented 3 weeks ago

At runtime, the robot state is estimated and corrected continuously by the IMU and pointcloud callback pipeline, both running in parallel.

IMU pipeline

The IMU callback gets new IMU measurement, transforms the data using IMU->robot extrinsics and IMU intrinsics.

Then the linear acceleration and angular velocity are corrected using the last known biases: this->state.b.accel and this->state.b.gyro. The data is then pushed to the IMU data buffer.

Then, the robot state is updated by integrating the current measurement over dt (time difference since the last IMU measurement)

Lidar pipeline

The pointcloud callback gets a new lidar scan, transform the data using LIDAR-> robot extrinsics.

Then it uses measurements from the IMU data buffer to estimate the robot pose at the time of each point in the scan (starting from the last GICP-aligned state).

The estimated robot pose at the middle of the scan is used as initial guess for GICP alignment.

After GICP alignment, the robot estimated biases/velocity are corrected based on the error between the current state and GICP-aligned state.

Problems

Let's note t_k the start timestamp of scan k, Dt_k (e.g. 10ms) the total duration of the scan and l_k (e.g. 10~100ms, hopefully faster than lidar sensor frequency) the total latency of the lidar pipeline (including data transfer, processing, etc.).

The state accel/gyro biases, velocity and orientation are updated at the end of the Lidar pipeline. In other words, these values are updated at time t_k + Dt_k + l_k. However, these new bias values are meant to be applied from current scan time t_k. Since the IMU pipeline is running in parallel, all IMU measurements between t_k and t_k + Dt_k + l_k have already been integrated with the previous bias. Basically, the bias update is "lagging" by approximately 1~2 lidar scan. Obviously, such approximation is necessary if we want to publish a real-time estimate of the robot state, for example the one published by publishToROS. However I think this lag has to impact the data in the imu_buffer (used during the lidar pipeline IMU integration). For example, while the IMU pipeline would use lastest known biases to produce the current state estimate, all IMU measurements would be stored uncorrected in the imu_buffer and only corrected in the lidar pipeline during the IMU integration (using a buffer of the last few known biases).
I think there is a bigger problem with the way backpropagation is implemented, as updateState seems to mix 2 totally different states. On one hand, there is the GICP-optimized state from t_(k-1), which goes through the IMU integration to produce an estimated state at t_k, which is then corrected using GICP. On the other hand, there is the global this->state which is synchronized with IMU data. So by the time the lidar pipeline reaches the updateState function, the current state estimate is already ahead of time (approximately t_k + Dt_k + l_k). Then I find 2 problems with the updateState function: 2.1. it compares a pose estimate at time t_k with a pose estimate at time t_k + Dt_k + l_k to compute a correction. 2.2. the correction is applied on the current state, by doing as if the time difference between the two states was the lidar scan frequency (=Dt_k). Thus ignoring l_k may be actually big. I think 2.1 can be the source of many problems. The geometric observer should rather compare poses with the same timestamp: the initial guess from IMU integration with the GICP-aligned state. Then the correction should be applied to the current state by using the time difference between the previous lidar scan t_(k-1) and the latest state stamp (=latest IMU measurement, approx. t_k + Dt_k + l_k). Indeed, the incorrected biases/velocity/orientation used from t_(k-1) to t_k have also been used after t_k. Note that if we do so, the problem 1 described above does not exist anymore.

VRichardJP commented 3 weeks ago

As I get into the implementation details, I realize I missed an important detail regarding the current implementation: the current lidar pipeline does not use state but an hybrid state mixing lidarPose (which is last GICP aligned pose at t_(k-1)) and current state velocity as a starting point for IMU integration (let's ignore the fact velocity is not synchronized as explained above). So if updateState is modified to compare the estimated pose at t_k with the GICP-aligned pose at t_k, then there is no link with state and it may slowly drift away. That is obviously not what we want.

But then, I'm wondering whether this mixed state (GICP pose + IMU integrated velocity) really makes any sense. Why not use preintegrated state directly? Of course, a snapshot at ~t_(k-1), not the latest one.

In other words, instead of keeping only the latest estimated state, all the states computed by the IMU pipeline would be stored in a queue. Then, when a new point cloud is received, the queued state with the closest timestamp (< current scan stamp) would be used as start point of the IMU integration instead of lidarPose. The deskewing process should yield the exact same result, but the new initial estimate of the state at t_k would now really be the same "state" the IMU pipeline has been working on (it still should be close from the original lidarPose based guess)

Finally, back propagation (updateState) would be done by comparing the initial guess to the GICP aligned pose, and states further in the queue would then be corrected based on how much time has passed since t_k.

Note there are not necessarily that many states to update, since all the states up to t_k + Dt_k (but one) could be dropped as there would not be needed for the later scans.

Also note this new approach is way more deterministic than the current implementation: order of processing of IMU and lidar scan messages almost does not matter (still does a little because of updateState), while synchronized processing of the lidar and IMU messages is critical for the current design.

VRichardJP commented 2 weeks ago

It took me a while, but I think I have now a working implementation of this approach, including https://github.com/vectr-ucla/direct_lidar_inertial_odometry/issues/58.

With a pure discrete integration based implementation and without much tuning on the geometric observer parameters, I have measured significant improvement on the predicted pose accuracy compared to the current implementation. There are too many changes for me to make a PR, but if anyone is interested, here are the significant changes I have made to the code:

states and IMU measurements are now timestamped and stored in a queue (only the most recent are kept). All states are synchronized with IMU measurements (1 state for each measurment)
updateState, propagateState and integrateImuInternal are all replaced by a single integrateGeoDiscrete function. This function takes:
- a state with timestamp t
- an IMU measurement with timestamp t+dt
- optionally, an estimated pose (from pure IMU integration) and true pose (from GICP alignment), both corresponding to a time t': t < t' < t+dt
integrateGeoDiscrete returns the estimated state at time t+dt from discrete integration (first-order Taylor expansion). If the optional estimated and true poses are provided, this integration includes all corrections factors described in https://github.com/vectr-ucla/direct_lidar_inertial_odometry/issues/58. The only difference I have with the original paper at the moment is that I set angular acceleration to 0 for now, because my IMU data is way too noisy and I have not found any good filter yet.
the IMU pipeline is still the same: when a new IMU measurement is received, it is pushed in the buffer queue, and I run integrateGeoDiscrete on the latest state to generate a real-time estimate of the current state (placed at the end of the state queue)
the LIDAR pipeline works slightly differently:
- deskewing works by first finding a prior_state in the state queue, for which prior_state.stamp <= scan_stamp < next_state.stamp
- then each point state is estimated using pure discrete integration, by first integrating all IMU measurements with stamp before the target point, then by running a "partial" discrete integration with the new IMU message (dt' = t'-t < dt).
- from the prior_state, I also calculate scan_state: the estimated robot state at scan stamp (still using a "partial" discrete integration of the next IMU measurement)
- I align scan_state using GICP, the GICP-aligned pose becomes my "true pose" and scan_state pose my estimated pose
- finally, I reset the state queue, by restarting from prior_state (the last state before scan stamp, previous states have no more use), then integrating a single IMU message with the estimated and true poses to apply the corrections, then integrating all remaining IMU messages. Old unused IMU messages are then deleted.

The reason I run a single integration with correction is because the estimated and true poses correspond to scan time t', where t < t' < t+dt with t being the prior_state stamp and t+dt the IMU measurement stamp I want to integrate. According to theory, the geometric observer correction only makes sense "around" t', so I find it is best to apply the correction only between t and t+dt and not over different or larger time windows. Of course, it means the K* parameters have to be tuned to reflect the shorter integration time (@ IMU freq) compared to the original implementation (@ LIDAR freq).

I think this approach is already more robust than the original design. AFAIK, this approach is also 100% deterministic (processing order does not matter)

kennyjchen commented 1 day ago

Thanks for the extensive analysis and work @VRichardJP. A PR would certainly be appreciated if possible!

vectr-ucla / direct_lidar_inertial_odometry

Regarding state access and update timeline #57

IMU pipeline

Lidar pipeline

Problems