The completion problem of moving objects (e.g., the very long car in the semantic scene completion groundtruth)

JennyVVV commented 4 years ago

Hi, thanks for sharing such a great dataset, I have a question about the completion of the moving objects, for example, the moving cars.

In your given labels, the moving cars will be reconstructed to the very long cars, this may be because you used 70 frames of data to complete the scene, but this is not consistent with common sense, the input is only a normal size car, while the label is a very long car. What I want is the completed output is also a normal size car. What should I do? Hope for your reply and thank you in advance.

jbehley commented 4 years ago

Thanks for your interest in the dataset.

We had also discussions what is the best way to handle moving objects and we finally decided to have moving objects represented in the current way. The current representation allows (or requires) to predict the future trajectory of the moving object and this information can be integrated in planning or other tasks that need this information which paths might be blocked. Maybe @mgarbade can provide more insights about this decision, since he's the expert on semantic scene completion.

However, it should be simple to modify the voxelizer to consider only the current timestamp and ignore labels from future moving objects. One could also try to predict how the scene looks in 0.5 secs, 1 sec, ... or even make a regression. As you see there are many ways to frame the task and every interpretation might be compatible with "common sense."

Nevertheless, I guess that only learning with static cars should be enough to learn how to complete cars, since there are also plenty of cars on the street parked.

mgarbade commented 4 years ago

@jbehley is right. We discussed quite a lot about this issue. First idea was to fit CAD models to all dynamic cars in the scene. However the fits didn't look very good / plausible so we discarded that idea. The current setup represents moving cars as "spatio temporal tubes". These tubes encode driving direction as well as speed of the observed dynamic car. We found this to be a practical output for a task like autonomous driving, though it does not lead to a realistic reconstruction of the scene (as desired in a 3D reconstruction task). Another argument for the "spatio temporal tubes" is that they represent a more significant signal in the target for learning. Looking at our own scene completion results, however, we observed that due to the abundance of "parking cars" in the dataset, our algorithm tends to reconstruct dynamic cars just like static cars, which looks good but is currently punished by our metric. Another option would have been to leave out all dynamic objects for training and evaluation, which we discarded since our focus is on autonomous driving and not realistic 3D reconstruction.

JennyVVV commented 4 years ago

@jbehley is right. We discussed quite a lot about this issue. First idea was to fit CAD models to all dynamic cars in the scene. However the fits didn't look very good / plausible so we discarded that idea. The current setup represents moving cars as "spatio temporal tubes". These tubes encode driving direction as well as speed of the observed dynamic car. We found this to be a practical output for a task like autonomous driving, though it does not lead to a realistic reconstruction of the scene (as desired in a 3D reconstruction task). Another argument for the "spatio temporal tubes" is that they represent a more significant signal in the target for learning. Looking at our own scene completion results, however, we observed that due to the abundance of "parking cars" in the dataset, our algorithm tends to reconstruct dynamic cars just like static cars, which looks good but is currently punished by our metric. Another option would have been to leave out all dynamic objects for training and evaluation, which we discarded since our focus is on autonomous driving and not realistic 3D reconstruction.

Hi, @jbehley @mgarbade thank you both for the detailed reply. I have fully understood your motivation for presenting dynamic objects in the current way. And I also modified the voxelizer to get the results I wanted. Thank you again.

jbehley / voxelizer

The completion problem of moving objects (e.g., the very long car in the semantic scene completion groundtruth) #2