nutonomy / nuscenes-devkit

The devkit of the nuScenes dataset.
https://www.nuScenes.org
Other
2.25k stars 622 forks source link

Question about extracting sequence of images, LiDAR, and radar data #239

Closed brade31919 closed 4 years ago

brade31919 commented 4 years ago

Hi nuscenes-devkit team, Currently, I am trying to extract video sequence dataset from nuScenes (like kitti odometry dataset). Specifically, I would like to render image sequences where each image data has its corresponding LiDAR and radar data. My method is similar to the suggestions in #160 and #180. But I encountered some problem:

The matched LiDAR data has some inconsistent measurements on the boundary of the car. For example: image image The discontinuities on LiDAR points do not match with the edge of the car. I know this might be resulted from other reasons (even bugs in my code). But I would like to double-check one possible issue:

According to the description in #160, the timestamp "of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved. Let delta_lidar be the time difference between two consecutive LiDAR sweeps (delta_lidar roughly equals to 1/20 s). Then, for example, if the LiDAR scans clockwise starting from the left side and the timestamp of the LiDAR scan triggering the camera is T, the timestamp of the "front" camera will be around T - (3/4) * delta_lidar (because the timestamp T is the finishing time of the scan). But if we just look for the LiDAR scan with the closest timestamp to the "front" camera, we will actually use (T - delta_lidar) one instead of the T one (Correct me if I am wrong or the explanation is not clear). How can I solve this problem? Can I know more details on which direction the LiDAR starts (and ends) its scanning?

Or do you have any suggestion on generating video (or say sequence) dataset from the nuScenes dataset?

Thank you in advance for any help!

Greetings, Juan-Ting

holger-motional commented 4 years ago

The discontinuities on LiDAR points do not match with the edge of the car.

This is expected behavior parallax issue (https://en.wikipedia.org/wiki/Parallax). Since the lidar and camera are not in the same position they see objects from a different viewpoint. For far away points this makes no difference, but for close-by points it can easily look like there is a 20-30cm offset in the lidar points rendered on an image. There is no way to entirely correct that in software.

.. we will actually use (T - delta_lidar) one instead of the T one

Yes, the camera is near-instantaneous, but the lidar records "continously" and therefore every lidar "column" is generated at a different timestamp and merged to form a single lidar pointcloud, which is motion corrected. I don't think you can improve this and as described above it's probably much less important than parallax.

Or do you have any suggestion on generating video (or say sequence) dataset from the nuScenes dataset?

What's the purpose of the video dataset? I'd just take it as-is. In fact I did that once at https://youtu.be/DQ2QywLIuOc , but unfortunately it was very slow and I don't have the code anymore.

brade31919 commented 4 years ago

Hi @holger-nutonomy,

Thank you for your prompt reply.

This is expected behavior parallax issue (https://en.wikipedia.org/wiki/Parallax)...

Yes, I know this factor contributes more to the alignment on the edge of the car.

The example I used is to point out that maybe we shouldn't just grab the LiDAR sweep which is the closest to the timestamp of the camera. Like... in the following figure: image

(Sorry for the ugly drawing...) In the example, LiDAR begins and ends its scan from the left side of the car and scans clockwise. So the LiDAR scan will trigger CAM_FRONT, CAM_FRONT_RIGHT, CAM_BACK_RIGHT, and then CAM_BACK (no space to draw FRONT_LEFT and BACK_LEFT so let's just ignore them) and record the timestamp. Then the whole scan ends and records the LiDAR timestamp T. If now we want to find the LiDAR scan for CAM_FRONT based "only" on timestamp, we will consider LiDAR scan (T - delta_lidar) as the closest instead of LiDAR scan (T). However, LiDAR T is the true one that triggers CAMERA_FRONT at corresponding timestamp not LiDAR scan (T - delta_lidar). In this manner, if we always find closest LiDAR scan using only the timestamp, we will find the wrong ones for some cameras (depends on the orientations of camera and which direction the LiDAR begins and ends its scan). So... I just want to make sure that if this is truly and issue. If true, can I know the direction the LiDAR begins and ends its scan. If false, I would like to know why it does not matter.

What's the purpose of the video dataset?

Previously, I tried to do monocular depth estimation on the dataset. In this setting, using RGB + LiDAR pairs from the key frames is pretty enough. However, if I want to further explore experiments on self-supervised learning or the odometry applications sequence data with higher frame rate will be better (the frame rate of the sample data is 2 Hz). After all, all the data are already collected. Tasks such as depth estimation, trajectory estimation, self-supervised learning will also be interesting apart from 3D detection.

holger-motional commented 4 years ago

Apparently the lidar starts on the left side (CAM_LEFT) and moves clockwise (from a birds-eye view). I agree that you can probably get a slightly better timestamp association using this method. Let us know if the differences are significant.

brade31919 commented 4 years ago

Let us know if the differences are significant.

Here are some results: image image image image The differences are not significant but slightly better on the object boundaries. After all, the differences will at most be one LiDAR time step (1/20 s) and depend on relative motion.

Regarding data association between radar and RGB, I think nothing can be done to make it better...

holger-motional commented 4 years ago

Nice work!

ghost commented 2 years ago

Nice work!

Since the exposure of a camera is triggered when the top LIDAR sweeps across the center of the camera’s FOV, why "Note that the cameras run at 12Hz while the LIDAR runs at 20Hz. The 12 camera exposures are spread as evenly as possible across the 20 LIDAR scans". In my opinion, the frequency of camera is equal to the frequency of LiDAR and it's 20Hz.

@holger-motional

holger-motional commented 2 years ago

@nm10k The frequency of cameras is definitely 12Hz in the sense that we take 12 pictures per second (nominally). You can think of it as essentially having 20 opportunities per second to trigger the camera and we use 12 of those and try to spread them out as evenly as possible.