nianticlabs / monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Other
4.05k stars 949 forks source link

While reading groundtruth depth-map, why is it required to divide by 256 after converting PIL.Image to np.array ? #473

Closed compvision-developer closed 1 year ago

compvision-developer commented 1 year ago

I don't know if I'm stupid in asking this question.

But, I'm unable to reason why in the CODE written by VictorCVision, the np.array is divided by 256 to get the depth_map.

Also, what is the unit of depth (is it meter ?) in case of these projected groundtruth PNG files. (Ex: KITTI_dataset/2011_09_26/2011_09_26_drive_0001_sync/proj_depth/groundtruth/image_02/0000000005.png)

@VictorCVision @mrharicot

daniyar-niantic commented 1 year ago

As mentioned in https://github.com/nianticlabs/monodepth2/issues/101#issuecomment-549660792 I suspect the depth map is saved as 16-bit grayscale "image". So, dividing by 256 would not rescale it to [0, 1] range. However, I'm not completely sure why the loaded values are divided by 256. I would suggest to check KITTI website.

daniyar-niantic commented 1 year ago

Hi @compvision-developer

The format used for projected groundtruth PNG files follows KITTI dataset documentation: https://github.com/joseph-zhong/KITTI-devkit#dataset-description