visinf / self-mono-sf

Self-Supervised Monocular Scene Flow Estimation (CVPR 2020)
Apache License 2.0
248 stars 47 forks source link

About network architecture #22

Closed 2020namemyself2020 closed 1 year ago

2020namemyself2020 commented 1 year ago

I notice that for every layer except, the disparity is limited to 0-0.3 by “self.sigmoid(disp_l1) * 0.3”. Why is it designed this way?I know the ground truth of the disparity is limited to 0-1 by the function _read_pngdisp in common.py. I think it may not be appropriate for other datasets such as Flyingthings3D because normalization is difficult to conduct as disparity is not between 0 and 255. If not normalization, will the result be worse?

2020namemyself2020 commented 1 year ago

Hello,sorry to interrupt you

hurjunhwa commented 1 year ago

The design choice follows the MonoDepth paper. The underlying idea is to let the network output the disparity between [0, 0.3 * image_width].

Yes, your point is absolutely right. The learned scale is only for the KITTI dataset and doesn't generalize to other datasets with different focal lengths. The estimation then would be only up to scale and/or up to shift.

2020namemyself2020 commented 1 year ago

Thanks for your explanation sincerely!