Downsampling KITTI ground truth

lmb-freiburg / flownet2

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

https://lmb.informatik.uni-freiburg.de/Publications/2017/IMKDB17/

Other

1k stars 318 forks source link

Downsampling KITTI ground truth #156

Closed saxenarohan97 closed 6 years ago

saxenarohan97 commented 6 years ago

Hi,

As you mention here, the Downsample layer downsamples the ground-truth blob to the size of the predicted-flow blob. Won't this be a problem while finetuning with KITTI images?

KITTI images only contain sparse ground-truth, with invalid pixels containing zero disparity values. If you resize the ground truth, won't it mess with the semantics of this labelling? What is the disadvantage of upsampling the predicted-flow to the ground-truth's resolution - is it the increased computation for processing larger resolutions?

nikolausmayer commented 6 years ago

Invalid pixels are represented as NaN within the network, not zero. When downsampling sparse data like KITTI, invalid pixels are ignored (see downsample_layer.cu#L52). This is the best you can do; there is no more information in that data to work on, and you have no way to encode subpixel information.

What is the disadvantage of upsampling the predicted-flow to the ground-truth's resolution [...]? The computational load is not that relevant. But from a data perspective, upsampling an optical flow or disparity image is no better than downsampling another. In fact, downsampling KITTI images makes the data appear denser! Downsampling does not destroy the data semantics because invalid pixels are ignored.

saxenarohan97 commented 6 years ago

Invalid pixels are represented as NaN within the network, not zero.

I see. Since the raw KITTI data uses zero disparity values as a marker for invalidity, I thought that is what is also used in the network. I understand the downsampling layer now, thanks.

MrRoboticist commented 6 years ago

@nikolausmayer related questions: which resizing algorithm does the Downsample layer use? Is it bilinear interpolation?

Also, what is the Resample layer, and how is it different from the Downsample layer?

nikolausmayer commented 6 years ago

@MrRoboticist Downsample uses (IIRC) bilinear interpolation with a threshold: if the source contains too many NaN values, the output is NaN as well (see downsample_layer.cu#L63).

Resample supports multiple algorithms, but has no backward pass. We use it for up- and downsampling in a deployed network. It is not used during training.