Closed Yao-Shao closed 5 years ago
ri_return1 is the first return ri_return2 is the second return
Does this mean that the frequency of the camera and lidar are not the same, say when the camera returns one image, the lidar returns more than two range images and you only save the strongest two?
You can read this http://home.iitk.ac.in/~blohani/LiDAR_Tutorial/Multiple%20return%20LiDAR.htm to know more about lidar multiple return.
@peisun1115 Might you be able to answer the following:
@Yao-Shao @pwais Use the first return will be OK. All the point cloud you have seen in the other datasets is the first return here.
The 2nd return is actually very informative for various tasks. It is encouraged to try that in your model. See a viz of the 2nd return in our tutorial
@xmyqsh @Yao-Shao The second return as-is is a useful feature for distinguishing vegetation. I think @peisun1115 is trying to highlight this result, which is actually (sadly) not at all available in the tutorial without modification:
first return:
second return:
My original question about timestamps was directed as the potential for measuring velocity, but the answer there is no.
While the second return looks useful for improving Waymo's own product, keep in mind this drive data is from 2017, and Waymo is currently upgrading the lidar in their fleet (as of November 2019). So the second return depicted above is not only an artifact of a proprietary sensor, but a deprecated one, too. So, from a research perspective, any result focusing on the second return is very unlikely to have any relevance outside of Waymo.
@pwais @xmyqsh
@pwais @xmyqsh
Here is another perspective from the geo-sensing domain. They say of lidars in that domain: LIDAR systems can typically record up to 3 to 5 return pulses per emitted pulse.
https://www.microimages.com/documentation/TechGuides/77LASptSelect.pdf
And another geo-sensing article: Multiple return 3d LiDAR scanners are one of the greatest inventions replacing Land Surveyor tasks ... http://lidarmag.com/wp-content/uploads/PDF/LIDARMagazine_Whitfield-MultipleReturnMultipleData_Vol7No3.pdf
That can give you additional evidence that multiple returns have great relevance outside of Waymo
Multiple LiDAR returns is common. Waymo Open Dataset is trying to align the research community with real production data.
@peisun1115 Good information sharing for us! Thank you! In a word, the first return may not always the strongest return. And the second return can help to describe the forestry and the edge of the obstacles sometimes.
I have a question about the annotation of 3d laser object. Is it based on the raw point cloud or the calibrated point cloud? If the answer is the former, then the annotated 3d laser object is also affected by the rolling shutter problem. And we should also fix the annotated 3d laser object by ourselves following by fixing the 3d point clouds first based on the timestamped point clouds and the vehicle velocity.
Data indeed spans several years:
+----+-----+---+
|year|split|num|
+----+-----+---+
|2017|train|248|
|2017| val| 84|
|2018|train|191|
|2018| val| 42|
|2019| val| 76|
|2019|train|359|
+----+-----+---+
+-------+-----+---+
|month |split|num|
+-------+-----+---+
|2017-10|train|194|
|2017-10|val |65 |
|2017-11|val |16 |
|2017-11|train|38 |
|2017-12|val |3 |
|2017-12|train|16 |
|2018-01|train|27 |
|2018-01|val |6 |
|2018-02|train|27 |
|2018-02|val |6 |
|2018-03|val |11 |
|2018-03|train|65 |
|2018-04|train|24 |
|2018-04|val |7 |
|2018-05|train|2 |
|2018-11|train|32 |
|2018-11|val |7 |
|2018-12|val |5 |
|2018-12|train|14 |
|2019-01|train|28 |
|2019-01|val |8 |
|2019-02|train|56 |
|2019-02|val |7 |
|2019-03|train|109|
|2019-03|val |27 |
|2019-04|train|21 |
|2019-04|val |3 |
|2019-05|val |31 |
|2019-05|train|145|
+-------+-----+---+
With respect to:
Waymo Open Dataset is trying to align the research community with real production data.
If Waymo is genuinely trying to "align the research community" in any way, I'd strongly recommend:
@xmyqsh
The 3D lidar labels are done in vehicle frame. So i think it is raw point cloud in your terminology. I do not see why it is necessary to 'correct' anything. All points can be transformed to the same coord system. Objects (not SDC itself) in the scene can be moving. But a good object detection model is supposed to handle that.
@peisun1115
virtual point cloud: rolling shutter has been fixed
raw point cloud: with rolling shutter
@peisun1115
Sorry for disturbing you. I'm not very familiar with the annotation process of the point cloud.
Is the annotation process is based on the virtual point cloud
or the raw point cloud
defined above or any other ways?
Please clarify it for me. I feel it is important. And I'm curious about it.
@xmyqsh
No worries. I am happy to help and learn from you as well.
You can assume that we label on point cloud returned by this function. So I think it is raw-point-cloud in your context.
I do not know why you need to 'fix' rolling shutter. Note that in the 'raw' point cloud, they are in a virtual vehicle frame. Ego motion is already compensated.
I am curious to know how you 'fix' the rolling shutter effect though.
@peisun1115
Got it! For the TOP laser, ego-motion could be compensated by the range_image_pose_compressed
to the virtual vehicle frame pose.
And you label on point cloud on that frame pose.
That is a perfect label. No problem!
But, for the other lasers whose range_image_pose_compressed
is empty, the label will be affected by the rolling shutter problem.
Am I right?
@xmyqsh
Yes for TOP Lidar.
For other lidars, all the points are in the same vehicle frame. So no need to worry about rolling shutter effect for them. https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L161
*We do plan to release per point pose for all LiDARs in the future.
@peisun1115
Got it.
The last question: In spite of using the range_image_pose_compressed
and raw range image
to get the virtual range image
in the vehicle frame of a specific scan, what's the other use of range_image_pose_compressed
and raw range image
?
It seems that the virtual range image
is what we need finally. And we can get it from range_image_pose_compressed
and raw range image
. So, why not give us the virtual range image
directly?
We try to release more information such that users can be creative. 3d point cloud is a popular data format to do research but we think that other formats can be useful as well. For example, the raw range image is a dense representation which can be used for various tasks directly as well
@peisun1115
But I still think the virtual range image is a better and more realistic representation. But when the autonomous driving car is running, can we get the range_image_pose_compressed
info in real-time and compensate the raw lidar sweep to get the virtual lidar point cloud in a specific frame?
Or, is the raw point cloud/raw range image more prefer to be used in real-time?
To echo @xmyqsh 's expectations, I've talked to three to four dozen grad students and researchers who have used open datasets besides Waymo's. These nuisances involving camera-lidar-label sync, data encoding, etc. are simply at odds with interests in most research topics, including more crucial areas like detection and forecasting. Many grad students can barely make it through NuScenes, which is relatively simple in comparison to this dataset.
Some of these details are relevant for topics in odom and calibration. But the Waymo dataset isn't particularly useful for mapping problems in urban settings because the cloud is truncated above a few meters, not much segment overlap, etc.
It would be tremendously helpful if the Waymo data were made available in a form that's compatible with an existing API, like NuScenes or KITTI, with sync issues resolved / removed. Without a standard solution here (either from Waymo or [more likely now] some independent project), perception benchmarks on this dataset will be essentially incomparable (despite the extensive tracking metrics code included in this repo).
@pwais This waymo dataset should have removed the sync issues(dataset has synced perfectly), as well as KITTI. Right? @peisun1115 And for NuScenes and argoverse, they have released the resolve tools for sync. Right? @pwais
@xmyqsh NuScenes, Lyft Level 5, and Argoverse all provide point clouds as arrays of (x, y, z) points in either the sensor's frame or the ego frame.
NuScenes -- Cameras are only at 12Hz, and lidar at 20Hz. Their code offers simple motion correction: move the lidar points to world frame using ego pose at lidar time, then move points from world frame to target frame (e.g. camera) using ego pose at target time. Works fine in my experience; I don't think they drive very fast though. Cuboid labels are only at 2Hz, but they also provide code to interpolate (for moving cars), and this also works pretty well.
Lyft Level 5 (uses NuScenes API) -- Cameras, lidar, and labels all sampled at 5Hz. Camera are global shutter and are synchronized to lidar. Lidar scan begins at the rear, so rear camera has a visible lidar break. The center of the scan is very dense, as dense as Waymo. Demo of both of these things below:
Despite Lyft's camera/lidar sync, there's quite a bit of drift. Lidar scans are at 10Hz (data release is just 5Hz), and Lyft drives areas with higher speed limits of 35mph+. Not sure if they're using lidar-based localization. For static objects, simple motion correction of the cloud (provided thru NuScenes / Lyft code) yields camera-lidar agreement as good as demonstrated above in the Waymo screenshots. For moving objects, tho, as you can see, there can be quite a lot of drift.
Argoverse -- They have two lidars sweeping at 180 degrees opposed and fuse them for a single 10Hz scan, so their lidar-camera sync issue is potentially more complicated. However, I believe they might not exceed 25mph, and cameras are recorded at 30hz. They recommend simple motion correction using ego pose data (believe they're using lidar-based localization in the dataset) that's similar to what NuScenes does. Through this correction, and perhaps thanks to the high camera frame rate, I've seen very good camera-lidar agreement.
For each of these above datasets, getting a training artifact suitable for a detection or forecasting problem is not only relatively straightforward, but all also offer transformation to KITTI format. Furthermore, obtaining a point cloud in ego frame is trivial and most certainly does not require the overhead I've seen with Waymo: (1) Tensorflow, (2) the large overhead of a Tensorflow graph and session, (3) 6-7 seconds of processing time on a modern Intel CPU.
None of the datasets above offer second return info. That said, all do offer examples of trees, precipitation, windows, etc.
@pwais Thanks for sharing! Very useful information! But I cannot agree with you totally. For example, for the Argoverse dataset, they use two 16-line lidars, both of them sweeping at 360 degrees, not 180. The purpose of these 2 overlay lidars should be want to animate a 32-line lidar. And they should use multi-sensor fusion localization, because the poses are very dense.
For waymo using Tensorflow graph to process point cloud, this enable you to run it on gpu which numpy cannot. From the aspect of efficiency among the three generations of deep learning architecture(layer-based, graph tensor-based and program-based) and numpy, I think the graph tensor-based version is the fastest, as fast as you use GPU/TPU.
@xmyqsh Argoverse: yes both lidars do 360 degree sweeps, but they're 180 degrees out-of-phase, so camera sync is potentially more complicated. Sorry if that's wasn't clear, but I think we're saying the same thing.
Using a GPU/TPU to decode point clouds has entertainment value only. Makes no sense to waste GPU memory and compute on a straightforward data transformation that will not change during the course of most typical experiments. Moreover, pytorch is extremely popular now, so having such a key part of the Waymo code implemented in Tensorflow is a non-starter for many researchers. https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1efb/
We provide raw range image for TOP LiDAR and virtual range image for other LiDARs. But they can all be converted to the popular point set (x,y,z, f1,f2..) format. Raw range image is useful for various tasks, if you don't find it useful for your research, just convert to what you want. However, if we directly provide (x,y,z, f1,f2...) then users cannot go back to raw range images.
In terms of speed of processing point cloud on CPU, try to implement it properly. Simply trying our tutorial on the public sandbox (the machine is not very good. A powerful machine can do better). It takes 1.9 seconds ( .Not to mention that you can try it on GPU/TPU. If you have a faster implementation that decode the same format, let us know. To step back, even if it is slow, it does not matter, just have a preprocessing job to convert the data to your format. It is one-off.
@pwais Argoverse: It is weird to sweep 360 degrees and only use 180 out of phase. You can try to decouple them like this:
Also, Argoverse has solved the rolling shutter problem and all the point clouds are provided in the vehicle frame and timestamped, and a dense pose position is provided. It is easy to do the motion compensation on it.
@peisun1115 We can go back to the raw range image if you provide the raw point cloud. The difference is that if you have compensated for the rolling shutter problem on it or not. Most datasets only provide the compensated no-rolling shutter problem point cloud, while waymo also provides the raw rolling shutter problem point cloud. They also provide the tools and data to compensate for the rolling shutter problem.
What a pity, they only provide the tools and data to compensate for the rolling shutter problem. But the user cannot develop an Odometry or SLAM algorithm on the waymo dataset to compensate for the rolling shutter problem by themselves because the cloud is truncated above a few meters, not much segment overlap as @pwais said, which make this dataset not perfect in some terms of view.
@peisun1115 @pwais What I want to know is that when we are in the real-time autonomous driving car, the lidar 's feedback is the raw point cloud/raw range image, and our perception algorithm is based on the raw point cloud/raw range image actually. For the virtual/accurate point cloud, we can only get it from SLAM to calibrate the point cloud which maybe not in real-time.
So it is valuable to develop a real-time SLAM algorithm to make the localization, mapping/point cloud calibration and perception together. And this is the value of the real-time SLAM method on the autonomous environment.
What a pity, this dataset cannot provide us such a real environment, if what's @pwais have said is true that this dataset cannot support SLAM.
It is good for this dataset to expose this problem in such a data style. It is a perfect dataset for the perception user to get a perfect result in some terms of virtual. It will be perfect if we can develop real-time SLAM and perception algorithms together on it as @pwais expected.
Waymo should have developed the real-time online point cloud calibration algorithm, right? @peisun1115
In the website, it says that two range images are provided for each lidar, one for each of the two strongest returns. Why there are multiple range images in one frame? Is there any relationship between them?