Question about FlyingData3D background depth

lmb-freiburg / flownet2

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

https://lmb.informatik.uni-freiburg.de/Publications/2017/IMKDB17/

Other

1k stars 318 forks source link

Question about FlyingData3D background depth #122

Closed shopkauf closed 6 years ago

shopkauf commented 6 years ago

May I ask if the background of each image is planar or does it contain complicated depth? In the paper, it says "The base of each scene is a large textured ground plane", so it seems to be planar. But it also says "We generated 200 static background objects". Thanks.

nikolausmayer commented 6 years ago

The 3D scenes contain:

a flat texture plane that serves as "ground" reference
a hemispherical textured skydome
200 static objects (simple geometric solids) placed somewhere on the flat ground (we refer to them as "background" because they are further away from the camera than the moving foreground objects)
a random number of dynamic foreground objects with complex geometry

Here's an example. The background is simple geometric shapes, but not just flat. Whether that makes for "complicated" depth is in the eye of the beholder :)

finalpass_0006_l disp_0006_l

shopkauf commented 6 years ago

Thanks so much for the quick reply. Let's say, there is a flat area in the background that is textured as tree leaves. This could wrongly teach the algorithm that leaves have uniform depth. So, I guess one must be careful in choosing the texture for the flat pixels?

nikolausmayer commented 6 years ago

That is definitely valid if you are doing depth-from-single-image.

In our setting, I'd actually claim that the opposite is true: If all our flat object had "flat" textures, then the network would fail e.g. when looking at a magazine or a poster or a painting—it would see structure and think "oh there's structure here, so this thing can't be flat!".

We try to remove as much bias as possible. Our dataset contains untextured and textured simple and complex objects. To perform well on both complex geometry and flat objects with "leaves" textures, the network must learn to infer depth from actual stereo matching instead of priors.

nikolausmayer commented 6 years ago

(closed due to inactivity)