Closed 2020namemyself2020 closed 1 year ago
Hello,sorry to interrupt you
The design choice follows the MonoDepth paper. The underlying idea is to let the network output the disparity between [0, 0.3 * image_width].
Yes, your point is absolutely right. The learned scale is only for the KITTI dataset and doesn't generalize to other datasets with different focal lengths. The estimation then would be only up to scale and/or up to shift.
Thanks for your explanation sincerely!
I notice that for every layer except, the disparity is limited to 0-0.3 by “self.sigmoid(disp_l1) * 0.3”. Why is it designed this way?I know the ground truth of the disparity is limited to 0-1 by the function _read_pngdisp in common.py. I think it may not be appropriate for other datasets such as Flyingthings3D because normalization is difficult to conduct as disparity is not between 0 and 255. If not normalization, will the result be worse?