Range Image Returns - Githubissues

Yao-Shao commented 5 years ago

In the website, it says that two range images are provided for each lidar, one for each of the two strongest returns. Why there are multiple range images in one frame? Is there any relationship between them?

message Laser {
  optional LaserName.Name name = 1;
  optional RangeImage ri_return1 = 2;
  optional RangeImage ri_return2 = 3;
}

peisun1115 commented 5 years ago

ri_return1 is the first return ri_return2 is the second return

Yao-Shao commented 5 years ago

Does this mean that the frequency of the camera and lidar are not the same, say when the camera returns one image, the lidar returns more than two range images and you only save the strongest two?

peisun1115 commented 5 years ago

You can read this http://home.iitk.ac.in/~blohani/LiDAR_Tutorial/Multiple%20return%20LiDAR.htm to know more about lidar multiple return.

pwais commented 5 years ago

@peisun1115 Might you be able to answer the following:

Is there less than 10ms difference between the two returns? I.e. do the returns come from a single pulse? And the effects of egomotion between the returns, if any, are negligible?
Will the promised benchmarks that Waymo said it would post involve simply fusing these two returns versus encoding them distinctly as input to a neural net?

peisun1115 commented 5 years ago

There is no 10ms difference between 2 returns. The 2 returns are from a single pulse. the effects of egomotion between the returns are negligible.

Waymo is going to publish baseline results on the dataset. The amount of details to be published is yet to be decided. We encourage users to be creative to make use of the data and do research. The way to use multiple returns depends on your model.

xmyqsh commented 5 years ago

@Yao-Shao @pwais Use the first return will be OK. All the point cloud you have seen in the other datasets is the first return here.

peisun1115 commented 5 years ago

The 2nd return is actually very informative for various tasks. It is encouraged to try that in your model. See a viz of the 2nd return in our tutorial

pwais commented 5 years ago

@xmyqsh @Yao-Shao The second return as-is is a useful feature for distinguishing vegetation. I think @peisun1115 is trying to highlight this result, which is actually (sadly) not at all available in the tutorial without modification:

first return: Screen Shot 2019-11-15 at 21 46 31

second return: Screen Shot 2019-11-15 at 21 46 41

My original question about timestamps was directed as the potential for measuring velocity, but the answer there is no.

While the second return looks useful for improving Waymo's own product, keep in mind this drive data is from 2017, and Waymo is currently upgrading the lidar in their fleet (as of November 2019). So the second return depicted above is not only an artifact of a proprietary sensor, but a deprecated one, too. So, from a research perspective, any result focusing on the second return is very unlikely to have any relevance outside of Waymo.

peisun1115 commented 5 years ago

@pwais @xmyqsh

Our dataset contains data ranging from 2017 to 2019. So it is not a deprecated one.
2nd return is available for other LiDARs as well such as velodyne HDL 32-E. So research on signals like this should be able generalized to other users. Find more by searching velodyne number of returns

peisun1115 commented 5 years ago

@pwais @xmyqsh

Here is another perspective from the geo-sensing domain. They say of lidars in that domain: LIDAR systems can typically record up to 3 to 5 return pulses per emitted pulse.

https://www.microimages.com/documentation/TechGuides/77LASptSelect.pdf

And another geo-sensing article: Multiple return 3d LiDAR scanners are one of the greatest inventions replacing Land Surveyor tasks ... http://lidarmag.com/wp-content/uploads/PDF/LIDARMagazine_Whitfield-MultipleReturnMultipleData_Vol7No3.pdf

That can give you additional evidence that multiple returns have great relevance outside of Waymo

Multiple LiDAR returns is common. Waymo Open Dataset is trying to align the research community with real production data.

xmyqsh commented 5 years ago

@peisun1115 Good information sharing for us! Thank you! In a word, the first return may not always the strongest return. And the second return can help to describe the forestry and the edge of the obstacles sometimes.

I have a question about the annotation of 3d laser object. Is it based on the raw point cloud or the calibrated point cloud? If the answer is the former, then the annotated 3d laser object is also affected by the rolling shutter problem. And we should also fix the annotated 3d laser object by ourselves following by fixing the 3d point clouds first based on the timestamped point clouds and the vehicle velocity.

pwais commented 5 years ago

Data indeed spans several years:

+----+-----+---+
|year|split|num|
+----+-----+---+
|2017|train|248|
|2017|  val| 84|
|2018|train|191|
|2018|  val| 42|
|2019|  val| 76|
|2019|train|359|
+----+-----+---+

+-------+-----+---+
|month  |split|num|
+-------+-----+---+
|2017-10|train|194|
|2017-10|val  |65 |
|2017-11|val  |16 |
|2017-11|train|38 |
|2017-12|val  |3  |
|2017-12|train|16 |
|2018-01|train|27 |
|2018-01|val  |6  |
|2018-02|train|27 |
|2018-02|val  |6  |
|2018-03|val  |11 |
|2018-03|train|65 |
|2018-04|train|24 |
|2018-04|val  |7  |
|2018-05|train|2  |
|2018-11|train|32 |
|2018-11|val  |7  |
|2018-12|val  |5  |
|2018-12|train|14 |
|2019-01|train|28 |
|2019-01|val  |8  |
|2019-02|train|56 |
|2019-02|val  |7  |
|2019-03|train|109|
|2019-03|val  |27 |
|2019-04|train|21 |
|2019-04|val  |3  |
|2019-05|val  |31 |
|2019-05|train|145|
+-------+-----+---+

With respect to:

Waymo Open Dataset is trying to align the research community with real production data.

If Waymo is genuinely trying to "align the research community" in any way, I'd strongly recommend:

Adopting a set of categories consistent with what the research community uses. For starters, distinguish between bicycles and motorcycles. Distinguish signs from construction cones (the very name of the label is misleading).
Dropping the absurd restriction on sharing pre-trained models; use a standard open source or creative commons license. If Google / Waymo really want to keep this restriction, perhaps they might start paying back the ImageNet copyright owners for all the value that they've extracted from ImageNet. The Waymo provision does nothing in actuality to stop abuse and is a pure inhibition to law-abiding researchers. It shocks the conscious Waymo would even entertain the restriction, because it can only possibly provide value to Waymo through litigation, which is antithetical to research.
Ideally using publicly available hardware for the dataset. KITTI set the standard because it's reproducible.
Deprecating the code in this repo and releasing a version of the data compatible with a well-established existing API or format, e.g. NuScenes. Any solution agnostic of Tensorflow would help. And I'm saying this as a Tensorflow user! It shouldn't take 6-7 seconds on a modern Intel processor plus a TensorFlow graph just to decode a point cloud.

peisun1115 commented 5 years ago

@xmyqsh

The 3D lidar labels are done in vehicle frame. So i think it is raw point cloud in your terminology. I do not see why it is necessary to 'correct' anything. All points can be transformed to the same coord system. Objects (not SDC itself) in the scene can be moving. But a good object detection model is supposed to handle that.

xmyqsh commented 5 years ago

@peisun1115

~~virtual point cloud: rolling shutter has been fixed~~ Screenshot from 2019-09-01 16-47-31 ~~raw point cloud: with rolling shutter~~ @peisun1115 Sorry for disturbing you. I'm not very familiar with the annotation process of the point cloud. Is the annotation process is based on the virtual point cloud or the raw point cloud defined above or any other ways? Please clarify it for me. I feel it is important. And I'm curious about it.

peisun1115 commented 5 years ago

@xmyqsh

No worries. I am happy to help and learn from you as well.

You can assume that we label on point cloud returned by this function. So I think it is raw-point-cloud in your context.

I do not know why you need to 'fix' rolling shutter. Note that in the 'raw' point cloud, they are in a virtual vehicle frame. Ego motion is already compensated.

I am curious to know how you 'fix' the rolling shutter effect though.

xmyqsh commented 5 years ago

@peisun1115 Got it! For the TOP laser, ego-motion could be compensated by the range_image_pose_compressed to the virtual vehicle frame pose. And you label on point cloud on that frame pose. That is a perfect label. No problem!

But, for the other lasers whose range_image_pose_compressed is empty, the label will be affected by the rolling shutter problem. Am I right?

peisun1115 commented 5 years ago

@xmyqsh

Yes for TOP Lidar.

For other lidars, all the points are in the same vehicle frame. So no need to worry about rolling shutter effect for them. https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/dataset.proto#L161

*We do plan to release per point pose for all LiDARs in the future.

xmyqsh commented 5 years ago

@peisun1115 Got it. The last question: In spite of using the range_image_pose_compressed and raw range image to get the virtual range image in the vehicle frame of a specific scan, what's the other use of range_image_pose_compressed and raw range image ?

It seems that the virtual range image is what we need finally. And we can get it from range_image_pose_compressed and raw range image. So, why not give us the virtual range image directly?

peisun1115 commented 5 years ago

We try to release more information such that users can be creative. 3d point cloud is a popular data format to do research but we think that other formats can be useful as well. For example, the raw range image is a dense representation which can be used for various tasks directly as well

xmyqsh commented 5 years ago

@peisun1115 But I still think the virtual range image is a better and more realistic representation. But when the autonomous driving car is running, can we get the range_image_pose_compressed info in real-time and compensate the raw lidar sweep to get the virtual lidar point cloud in a specific frame? Or, is the raw point cloud/raw range image more prefer to be used in real-time?

pwais commented 5 years ago

To echo @xmyqsh 's expectations, I've talked to three to four dozen grad students and researchers who have used open datasets besides Waymo's. These nuisances involving camera-lidar-label sync, data encoding, etc. are simply at odds with interests in most research topics, including more crucial areas like detection and forecasting. Many grad students can barely make it through NuScenes, which is relatively simple in comparison to this dataset.

Some of these details are relevant for topics in odom and calibration. But the Waymo dataset isn't particularly useful for mapping problems in urban settings because the cloud is truncated above a few meters, not much segment overlap, etc.

It would be tremendously helpful if the Waymo data were made available in a form that's compatible with an existing API, like NuScenes or KITTI, with sync issues resolved / removed. Without a standard solution here (either from Waymo or [more likely now] some independent project), perception benchmarks on this dataset will be essentially incomparable (despite the extensive tracking metrics code included in this repo).

xmyqsh commented 5 years ago

@pwais This waymo dataset should have removed the sync issues(dataset has synced perfectly), as well as KITTI. Right? @peisun1115 And for NuScenes and argoverse, they have released the resolve tools for sync. Right? @pwais

pwais commented 5 years ago

@xmyqsh NuScenes, Lyft Level 5, and Argoverse all provide point clouds as arrays of (x, y, z) points in either the sensor's frame or the ego frame.

NuScenes -- Cameras are only at 12Hz, and lidar at 20Hz. Their code offers simple motion correction: move the lidar points to world frame using ego pose at lidar time, then move points from world frame to target frame (e.g. camera) using ego pose at target time. Works fine in my experience; I don't think they drive very fast though. Cuboid labels are only at 2Hz, but they also provide code to interpolate (for moving cars), and this also works pretty well.

Lyft Level 5 (uses NuScenes API) -- Cameras, lidar, and labels all sampled at 5Hz. Camera are global shutter and are synchronized to lidar. Lidar scan begins at the rear, so rear camera has a visible lidar break. The center of the scan is very dense, as dense as Waymo. Demo of both of these things below:

Despite Lyft's camera/lidar sync, there's quite a bit of drift. Lidar scans are at 10Hz (data release is just 5Hz), and Lyft drives areas with higher speed limits of 35mph+. Not sure if they're using lidar-based localization. For static objects, simple motion correction of the cloud (provided thru NuScenes / Lyft code) yields camera-lidar agreement as good as demonstrated above in the Waymo screenshots. For moving objects, tho, as you can see, there can be quite a lot of drift.

Argoverse -- They have two lidars sweeping at 180 degrees opposed and fuse them for a single 10Hz scan, so their lidar-camera sync issue is potentially more complicated. However, I believe they might not exceed 25mph, and cameras are recorded at 30hz. They recommend simple motion correction using ego pose data (believe they're using lidar-based localization in the dataset) that's similar to what NuScenes does. Through this correction, and perhaps thanks to the high camera frame rate, I've seen very good camera-lidar agreement.

For each of these above datasets, getting a training artifact suitable for a detection or forecasting problem is not only relatively straightforward, but all also offer transformation to KITTI format. Furthermore, obtaining a point cloud in ego frame is trivial and most certainly does not require the overhead I've seen with Waymo: (1) Tensorflow, (2) the large overhead of a Tensorflow graph and session, (3) 6-7 seconds of processing time on a modern Intel CPU.

None of the datasets above offer second return info. That said, all do offer examples of trees, precipitation, windows, etc.

xmyqsh commented 5 years ago

@pwais Thanks for sharing! Very useful information! But I cannot agree with you totally. For example, for the Argoverse dataset, they use two 16-line lidars, both of them sweeping at 360 degrees, not 180. The purpose of these 2 overlay lidars should be want to animate a 32-line lidar. And they should use multi-sensor fusion localization, because the poses are very dense.

For waymo using Tensorflow graph to process point cloud, this enable you to run it on gpu which numpy cannot. From the aspect of efficiency among the three generations of deep learning architecture(layer-based, graph tensor-based and program-based) and numpy, I think the graph tensor-based version is the fastest, as fast as you use GPU/TPU.

pwais commented 5 years ago

@xmyqsh Argoverse: yes both lidars do 360 degree sweeps, but they're 180 degrees out-of-phase, so camera sync is potentially more complicated. Sorry if that's wasn't clear, but I think we're saying the same thing.

Using a GPU/TPU to decode point clouds has entertainment value only. Makes no sense to waste GPU memory and compute on a straightforward data transformation that will not change during the course of most typical experiments. Moreover, pytorch is extremely popular now, so having such a key part of the Waymo code implemented in Tensorflow is a non-starter for many researchers. https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1efb/

peisun1115 commented 5 years ago

We provide raw range image for TOP LiDAR and virtual range image for other LiDARs. But they can all be converted to the popular point set (x,y,z, f1,f2..) format. Raw range image is useful for various tasks, if you don't find it useful for your research, just convert to what you want. However, if we directly provide (x,y,z, f1,f2...) then users cannot go back to raw range images.
In terms of speed of processing point cloud on CPU, try to implement it properly. Simply trying our tutorial on the public sandbox (the machine is not very good. A powerful machine can do better). It takes 1.9 seconds ( .Not to mention that you can try it on GPU/TPU. If you have a faster implementation that decode the same format, let us know. To step back, even if it is slow, it does not matter, just have a preprocessing job to convert the data to your format. It is one-off.

xmyqsh commented 5 years ago

@pwais Argoverse: It is weird to sweep 360 degrees and only use 180 out of phase. You can try to decouple them like this:

Also, Argoverse has solved the rolling shutter problem and all the point clouds are provided in the vehicle frame and timestamped, and a dense pose position is provided. It is easy to do the motion compensation on it.

xmyqsh commented 5 years ago

@peisun1115 We can go back to the raw range image if you provide the raw point cloud. The difference is that if you have compensated for the rolling shutter problem on it or not. Most datasets only provide the compensated no-rolling shutter problem point cloud, while waymo also provides the raw rolling shutter problem point cloud. They also provide the tools and data to compensate for the rolling shutter problem.

What a pity, they only provide the tools and data to compensate for the rolling shutter problem. But the user cannot develop an Odometry or SLAM algorithm on the waymo dataset to compensate for the rolling shutter problem by themselves because the cloud is truncated above a few meters, not much segment overlap as @pwais said, which make this dataset not perfect in some terms of view.

xmyqsh commented 5 years ago

@peisun1115 @pwais What I want to know is that when we are in the real-time autonomous driving car, the lidar 's feedback is the raw point cloud/raw range image, and our perception algorithm is based on the raw point cloud/raw range image actually. For the virtual/accurate point cloud, we can only get it from SLAM to calibrate the point cloud which maybe not in real-time.

So it is valuable to develop a real-time SLAM algorithm to make the localization, mapping/point cloud calibration and perception together. And this is the value of the real-time SLAM method on the autonomous environment.

What a pity, this dataset cannot provide us such a real environment, if what's @pwais have said is true that this dataset cannot support SLAM.

It is good for this dataset to expose this problem in such a data style. It is a perfect dataset for the perception user to get a perfect result in some terms of virtual. It will be perfect if we can develop real-time SLAM and perception algorithms together on it as @pwais expected.

Waymo should have developed the real-time online point cloud calibration algorithm, right? @peisun1115

waymo-research / waymo-open-dataset

Range Image Returns #45