PointCloud Sensor Elevation Uncertainty Model

jwag commented 7 months ago

First, thanks for making this library. I am planning to use it to implement some mapping of soil properties during bulldozing earthmoving operations.

I have been working my way through the code for Elevation Mapping CUPY and reference back to the paper [1] along the way. I am struggling with understanding how the measurement uncertainty is being computed for a pointcloud. The paper references another paper that develops a noise model for a Kinect sensor [2] as justification for replacing the elevation noise model of Fankhauser [3], equations 1-3, with noise model quadratic in the distance of the measured point from the sensor. The noise model proposed in [2] is composed of a radial component and an axial component which are determined to be linear and quadratic in depth to point respectively. If we make the assumption that the axial component is small wrt. the axial component (as is justifiable for larger distances as shown in [2]) then we can assume the quadratic in depth axial component is a decent approximation of the uncertainty distribution. In order to properly account for the sensor uncertainty in the z direction of the map, this axial uncertainty must then be projected along the z axis. The axial uncertainty lies along the unit vector between the sensor and the point. This projection is not discussed in in the paper [1].

In the function add_points_kernel() the custom_kernels.py script a point measurement is provided (presumably in the sensor frame) and the function z_noise() is used to compute the height variance as v = sensor_noise_factor rz rz where rz is the points z coordinate in the sensor frame. This is not the same as the distance to the point from the sensor d = |p| as described in [1] and [2]. The value of rz depends on the choice of sensor coordinate frame and there is no coordinate frame for a LiDAR or Camera sensor where |p|^2 = rz^2 for every point p.

Based on my understanding the noise should be computed based on d^2 using the function point_sensor_distance() and then projected along the z axis.

I may be misunderstanding something here, so please correct me if I’m wrong.

References: [1] "Elevation Mapping for Locomotion and Navigation using GPU" Takahiro Miki et. al. 2022 [2] "Modeling Kinect sensor noise for improved 3D reconstruction and tracking" Chuong V. Nguyen, et. al. 2012 [3] "Robot-centric elevation mapping with uncertainty estimates" Peter Fankhauser, et. al. 2014

mktk1117 commented 7 months ago

Hi jwag, thanks a lot for going through our works and checking the details. It's definitely more proper to consider the projection and using different noise parameters for each sensors.

Implementing better sensor noise model is some TODOs that I could not finish yet due to the time constraint. (At least it works okay practically) I am happy to integrate your suggested changes if you could have a chance improving them. If you decided to use our software and could implement these, it would be very beneficial for everyone if you could make a PR.

Thanks.

jwag commented 6 months ago

Hey mktk1117, I am interested in making PR if you think it would actually make an improvement in the performance of the system. It seems like this has worked rather well for you all in DARPA Sub-T and other real world scenarios. (I apologize for the long response here. I am very interested in this work and am trying to make sure that I understand what are problems before I spend time trying to make any improvements)

My understanding is that one of the main motivations for creating elevation_mapping_cupy (EM_cupy) was to to speed up the elevation_mapping approach by Fankhauser [3] (I will refer to it as EM) , handle larger point-clouds via GPU processing, and enable tighter integration with learning systems for traversability and semantics. The EM_cupy approach to propagation of uncertainty and accounting for pose drift given a moving map frame is different than the EM approach. I guess I have a few questions regarding you design decisions that I was hoping you could answer before I make any changes.

In EM, significant effort is taken to produce an elevation map in a robot-centric coordinate frame that has accurate bounds on the uncertainty of the height. This includes tracking of x,y, and z uncertainties of each cell and then transforming these uncertainties into height uncertainties by sampling adjacent cells in the map-fusion process. This step seems to have been skipped in EM_cupy. The robot pose uncertainty produced by a state estimation system is not even used as far as I can tell. My assumption is that this step is able to be skipped because of some of the modifications you have implemented including:
- Moving map frame occasionally, but not performing map-fusion and not updating elevation uncertainty
- Drift compensation - Matching incoming scans to existing map and moving the map to the scan in the vertical axis
- Incrementing variance in elevation at each cell with time
If the localization system has drift and the map frame is still being moved with the vehicle without pose uncertainty propagation, then the accuracy of the map (particularly in parts of the map that are not receiving new height observations) will degrade, as is expected. Without pose uncertainty propagation and map-fusion being performed though, this degradation is not accounted for in the cell variance. However, by increasing the variance of each cell with time this approximates the affect that a constant localization drift would have on the variance. Additionally, the vertical drift is corrected for using the drift compensation and the visibility cleanup helps to clear some artifacts created by the drift in the xy. Do you think that I am on track with this interpretation?
I think I understand better the sensor models now.
- The Kinect sensor errors are modeled as noise distributions in the axial direction (sensor z frame) and then the lateral (xy plane) uncertainty is modeled isotropically [2]. So the current use of z_noise() makes decent sense for a simplification of the Kinect sensor noise model.
- LiDAR sensors errors are modeled differently where a new coordinate frame must be defined for each point with the z axis aligned with the ray from the sensor to the point [5]. Then the noise is modeled as axial and lateral in that point dependent frame. This doesn't actually appear to be implemented in EM and instead the Kinect style coordinate frame is convention is used. This appears to be incorrect to me.
I have worked my way through all of this math (and documented it if you are interested) and think that I could implement both an improved noise model for a Kinect style sensor and a LiDAR sensor (with the point-wise coordinate frame error propagation). My question is, are you observing any issues related to the sensor noise model or error propagation that you think these changes would help with?
In some earlier approaches to elevation mapping [4], the full 3D probability distribution representing the uncertainty of a point measurement was propagated into the map frame and then this distribution was used to weight updates to multiple cells in a region around the measurement. In [3] a similar approach is taken in the map-fusion process where cells in a 2$\sigma$ distance surrounding the center cell are sampled to update the height of the current cell. In both of these approaches, the affect is to smooth the map and to fill in gaps. In EM_cupy I know the the smoothing and in-painting plugins can do this in a different way. I also know that EM_cupy is able to handle larger and denser point-cloud data so it may not be worth implementing. Given that map-fusion isn't being performed in EM_cupy, is there a need for something like this?

References: [1] "Elevation Mapping for Locomotion and Navigation using GPU" Takahiro Miki et. al. 2022 [2] "Modeling Kinect sensor noise for improved 3D reconstruction and tracking" Chuong V. Nguyen, et. al. 2012 [3] "Robot-centric elevation mapping with uncertainty estimates" Peter Fankhauser, et. al. 2014 [4] Uncertainty-based Sensor Fusion of Range Data for Real-time Digital Elevation Mapping (RTDEM), Cremean and Murray, 2005 [5] Noise characterization of depth sensors for surface inspections, Pomerleau et. al., 2012

jwag commented 6 months ago

Hello! I have started work on the PR, but would really like some feedback if you are available. I am also still interested in your response to the above questions.

I have currently implemented a SensorProcessor class that takes in a pointcloud, some transforms, and some uncertainty estimates and then produces a propagated uncertainty for each point. I have done this using the built in cupy functions without the use of custom kernels. I have benchmarked against numpy and they both run about the same with cupy running a bit slower. IDK if that means I have done something wrong or not (I haven't worked with cupy before).

My plan is to integrate this into the elevation_mapping class next. I also have LaTex documentation on the derivation that I could provide after some cleanup.

leggedrobotics / elevation_mapping_cupy

PointCloud Sensor Elevation Uncertainty Model #93