LIDAR_TOP calibration issue

enrico-stauss commented 1 year ago

Hi,

I'm currently looking into the data provided by the LIDAR_TOP sensor and stumbled over something which seems odd to me and might indicate a calibration issue. Below you see the point distribution of a selected scene from the NuScenes mini dataset where I loaded data in the ref_chan=LIDAR_TOP, so it should not undergo any transformations. I then simply transformed it to spherical coordinates and mapped point count to the angular position. At +90 degree azimuth you can see the front of the car.

What irritates me are some 'shadows' or 'peaks' that can be observed especially with objects left/right and on the rear of the car.

Similar observations can be made throughout many scenes and to me it looks like the data is shifted backwards (and/or tilted) relative to the sensor coordinate frame. I can roughly even out those spots by adding between 10 and 20cm to the y coordinate of the LiDAR data before transforming. But the amount is not consistent throughout scenes. Also I do understand that the spacing between the rings might not be equidistant (I picked that up in some paper) but this would not explain the curvature of the ring with low point count. And the curvature too gets less with the shift of data.

Now my question would be, if the issue is entirely on my side or, if this could in fact be an issue of the dataset. I imagine that range imaging approaches would suffer the most from it.

One last question I have is regarding the high point count in the leftmost part of the image. I'd assume that the sensor simply samples some points from a previous sweep, too. Any thoughts on that?

Thanks in advance and kind regards Enrico

whyekit-motional commented 1 year ago

@enrlc0 is your rendering based on only a single sweep?

enrico-stauss commented 1 year ago

Yes @whyekit-motional, exactly but it's similar with multiple sweeps. You then get more than one "shadow" with different spacings. This would be the same scene for 2 sweeps:

And another frame that better shows what I meant for 1, 2 and 3 sweeps (this would be the forward view only):

Edit: With an added translation (as mentioned) I can only even out issues in the target frame when loading multiple sweeps. Translating previous point clouds by the same shift BEFORE transforming to the target frame requires increasingly large shifts and translating AFTER transforming to the target frame does not seem to yield the expected effect.

whyekit-motional commented 1 year ago

@enrlc0 if you could share a (working) code snippet on how you produced the above range-view images, we could dive a little deeper into this

enrico-stauss commented 1 year ago

@whyekit-motional sure, no worries. I threw together some bits and pieces to reproduce the image I posted (and some more). You just need the devkit and the mini dataset to run it and you'll be asked for the data root once when you run it first.

Snippet

Additionally, find this sketch to illustrate why a shifted COOS could be the reason for the artifacts we're seeing. Note that red lines indicate emitted/recorded points and green lines (or rather the spacing in-between) indicate to which spherical COOS region the points are mapped in the shifted COOS. The blueish section would be a shadow in the range image and the orange a peak.

However, a tilt or a combination of shift and tilt is equally likely IMHO (if the error is not in my code or understanding).

whyekit-motional commented 1 year ago

Thanks for the very nice code snippet, @enrlc0!

Judging from the lidar points projected onto the various camera frames, I don't think it's a calibration issue, since the various points match up relatively well with the objects in the camera frames: (Since it is not a calibration issue, I will close this issue)

Based on your code, I can't quite yet figure out why there are shadows in your range-view image, but I can continue looking into it when I have time

In the meantime, here are some things you could try:

Render the range-view image using range to color the "pixels", instead of number of points in a "pixel" - this is just a sanity check, to check that the shadows / peaks you mentioned above are not due artifacts which can happen near the ego, e.g. lidar beams reflecting off the surface of the ego
Ensure that you are indeed loading the points in the lidar frame (not the ego frame, or any other frame)
Check that the various lidar hardware properties are set correctly (e.g. azimuth resolution, elevation resolution, number of beams, etc.) - each "pixel" in the range-view image should contain either only one point or zero points, if the range-view image is constructed properly
Explore an alternative way of creating a range-view image with nuScenes data here: https://github.com/nutonomy/nuscenes-devkit/issues/626, and see if there's any discrepancy between that and your code

enrico-stauss commented 1 year ago

Hi @whyekit-motional, unfortunately I cannot agree with you on this. Except that calibration issue might be the wrong term. You are right that the transformation between LiDAR and camera is correct and the 5 top images were just produced by devkit functionality. But then if you look closely at the pole of the traffic light or the tree, then you can see the shadow to the left, just as in the range image. In the image projection this would not be completely unexpected unexpected.

My understanding is that the LiDAR data is provided in a sensor frame which is not exactly the original LiDAR_TOP frame. All given transformations map correctly between the sensors but its just not exactly the sensors true coordinate frame. Do you apply any further transformations when transforming from raw data to xyz? Like calibration, since the transformation from/to the sensor is referenced as "calibrated_sensor" as in current_cs_rec = nusc.get('calibrated_sensor', current_sd_rec['calibrated_sensor_token']).

Coming to your points:

As mentioned, points do match "relatively well" with the image, but then as some critical points they don't. You still see shadows around the tree and traffic light
I did render all other channels of the point cloud, too and the result is always the same. Shadows in front of objects left/right of the car. Peaks are obviously not visible in other representations.
I mostly discarded artifacts by setting the minimum range to 1 and and discarding everything out of the FOV.
I did load the points in the LiDAR frame
'each "pixel" in the range-view image should contain either only one point or zero points' - that's exactly my point!
I quickly reconstructed the image with the code you referenced and the semantic-kitti code and even though the image is not half as nice, you see the exact same issue:

I hope you keep looking into this :) Kind regards Enrico

Edit: I made the effort of quickly visualizing the same point cloud in cartesian coordinates as a scatter plot where I mapped the elevation z to the color. For better visibility, I put the colorbar on a SymLogNorm (where the range from -0.1m to 0.1m) is scaled linearly. Also I added the height of the sensor to the data to make points level with the theoretical ground. The plot indicates that either the entire car or the sensor is tilted such that the region in front of the car appears elevated and the region towards the back appears lowered. I again observed the same throughout the mini dataset but not at constant impact. cart_elevation.pdf

nightrome commented 1 year ago

Hi @enrlc0. I'm joining the party late. Regarding your observation that there are shadows on the side of each object, I don't think it is an issue. We need to understand that the lidar pointcloud is collected over 50ms. Assuming 10m/s car velocity, this means that the car moves 0.5m during this period. There is of course motion compensation to bring all points into the same reference frame. Let's assume this reference frame is at the beginning of the sweep (or end, I don't remember). Then there will be lidar points recorded that are "behind" other points and there will be places that don't get any points.

enrico-stauss commented 1 year ago

Hi @nightrome, thanks for the (late) reply. It took me a while but in the end I figured it out.

nightrome commented 1 year ago

@enrlc0 Great. Can you share your insights?

enrico-stauss commented 1 year ago

Yeah sure @nightrome, as you noted it comes down to the EGO velocity and the comparably low speed of rotation. Imagine for example a tree to the left with a clockwise spinning sensor as indicated in the sketch below.

The point cloud is logged at timestamp $t^0$, see now the point recorded at $t^{-1}$ which hit the tree. Given a stationary EGO, then the next point measured at $t^0$ and an azimuthal difference of $\Delta \varphi$ would measure the tree, too. But as the EGO has now moved, the point passes the tree and measures the background. When assembling the range image by transforming $XYZ$ coordinates to spherical coordinates the delta angle will not be $\delta \varphi$ as expected but rather $\Delta \tilde{\varphi}$. That will be visible as the bespoken shadowing on object's corners. A similar thing happens on the other side of the measurement. The effect will be negligible for forward/backward measurements as the EGO mostly moves forward. The shadows are NOT an issue with the dataset but rather something intrinsic to the measurement. It might be worth consideration when constructing range images though.

I was able to construct "clean" range images by distorting the data. Given the rotation frequency of 20Hz and the EGO velocity, you can just shift the measurement forward by adding $\Delta y = \frac{\varphi}{2 \pi} \ \frac{s}{20} v_y $. Unfortunately I don't have a nice pic of a single sweep measurement at hand but the effect is still visible. The top is without velocity correction, the below with. If it were only one sweep there would not be any shadowing and the curved horizontal lines would be straight. The artifacts are from incorporating prior measurements.

I couldn't tell you for sure if range imaging performs better or worse with velocity correction but it is most likely negligible. The takeaway for me is, that the underlying assumption for the creation of range images is violated.

👋 Kind regards

nightrome commented 1 year ago

Thank you. Nice explanation!

jingyibo123 commented 11 months ago

I think what's missing is the motion compensation or motion undistortion, to elliviate the jello effect from continuous emitting lidar measurements. In high-precision lidar surveying, we use the timestamp up to nano-sec to do it.

However several points to confirm:

the xyz data stored in original pcd.bin files, are the raw lidar measurements defined in the lidar frame during the exact receiving timestamp, meaning the only processing done is a coordinate transform from spherical data to XYZ coordinate?
The timestamp attribute of the whole pcd.bin, does it associate with the first or last point of the scan?

Thanks for the confirmation @enrico-stauss @whyekit-motional

nightrome commented 11 months ago

@jingyibo123 The nuScenes dataset has motion compensation baked into the lidar file. This is why the the lidar pattern when viewed from a BEV perspective looks 1) more or less like a circle and 2) does not change, even when moving at high velocities.

Shubodh commented 4 months ago

@nightrome Could you please confirm if the below summary of this discussion is accurate? I am trying to create accurate range images from LIDAR point clouds.

Summary of this discussion: Consider a situation where our sensor is measuring ABCDE consecutive points on a flat building. Say our sensor measured AB, but because of vehicle's motion, it might miss the point C and "restart" its measurement from D after say vehicle's motion of 0.5m. Therefore, because of vehicle's motion, there are bound to be shadows and there is not much we can do about this, because this is the way the measurement are intrisically taken by any LIDAR sensor (as the vehicle moves) and this is not a specific issue with this dataset. This is the case even with motion compensation, because motion compensation merely brings all the points into same reference frame: this would do nothing to the fact that the sensor missed measuring some points (shadows) during its sweep. To make the range images better, @enrico-stauss suggested a shift in measurement along y-axis, but they're not sure whether it has improved their final range images.

To conclude, there is nothing conclusive on this thread that can be applied to get accurate range images. Perhaps I can try post processing techniques once i get the range image like image impainting techniques etc. (Do let me know if you have any other suggestions here for postprocessing)

enrico-stauss commented 4 months ago

Hi @Shubodh, I'll hop in on this thread again.

I don't think your summary is accurate but to be honest it takes quite a bit of thinking to get a good understanding of the problem. You're gonna have to make a distinction between the actual recorded points and what you want to achieve in order to generate a range image.

I'll try to explain in other words what I have written before. The data provided by NuScenes are the raw recorded points in cartesian coordinates and correctly represent the environment around the sensor. I don't know exactly which kind of motion compensation has been applied but I think we can skip over it for the sake of the discussion.

Now when we start talking about range images, what is it that you want? I for my self can think of 2 scenarios

Use the polar and azimuth of recorded points to map them to a grid that you call range image
Use the raw oder of the recording of points to assign them to a grid and call that range image

The first scenario is, what is typically done but it results in ambiguities due to the motion (mostly of the EGO, but also of other objects in the viscinity). In other words, it can happen that 2 points are measured at the same azimuthal angle. In my images below, this would be observed as a peak in the point density. As you can imagine, if two points are measured at the same azimuthal angle then there must be one point missing at another azimuthal angle.

The method that I suggested distorts the actual coordinates of measured points for the sake of preserving as much of the data as possible to feed into a CNN. The range image is clean in the sense that the point density is relatively homogeneous but it does not reflect the environment accurately. You could however use the distiorted coordinates to create the range image and then feed the true coordinates as features. It would be very interesting to see how (and if) this affects the performance of a CNN based range imaging model. For multiple sweeps, you'd have to create the range image per sweep and I think it would break down quickly.

I hope this clarifies it a bit but as I said, it's not easy to wrap your head around. Kind regards

nutonomy / nuscenes-devkit

LIDAR_TOP calibration issue #902