zzangjinsun / NLSPN_ECCV20

Park et al., Non-Local Spatial Propagation Network for Depth Completion, ECCV, 2020
MIT License
321 stars 55 forks source link

Question about NYUdepthv2 Dataset #40

Closed mmmcn closed 2 years ago

mmmcn commented 2 years ago

Hi, thanks for sharing this great work!

The NYUdepthv2 test set contains depths(misssing depth values were filled in) and rawdepths, which I can get from the Labeled dataset (~2.8 GB) provided by NYU. When I testing the model, it seems that the depth map I got from .h5 file (provided by you) corresponds to the reconstructed depths(missing depth values were filled in), I confirmed this by visiualizing some depth maps and running test script sepearately and got the same evaluation results (RMSE: 0.0919 MAE: 0.0337 iRMSE: 0.0137 iMAE: 0.0047 REL: 0.0113 D^1: 0.9955 D^2: 0.9993 D^3: 0.9998). But when I running test on rawdepths, i.e., got sparse depth from raw depth map and the raw detph as gt depth, the evaluation result I got is RMSE: 0.4594 MAE: 0.1368 iRMSE: 25.1806 iMAE: 0.6456 REL: 1.3616 D^1: 0.9357 D^2: 0.9547 D^3: 0.9645, which looks bad.

Is there any difference between the depth map used during training and the raw depth map in test set? Thanks in advance.

zzangjinsun commented 2 years ago

Hi, I think you'd better first check RGB depth value ranges of h5 and raw data (i.e., [0, 1] or [0, 255])

mmmcn commented 2 years ago

I have checked the RGB depth value, and I don't think that's the problem cause I can get reasonable result when using depths(missing depth values were filled) for testing.

As Fangchang Ma and Sertac Karaman mentationed in their paper Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a single Image:

For training, we sample spatially evenly from each raw video sequence from the training dataset, generating roughly 48k synchronized depth-RGB image pairs. The depth values are projected onto the RGB image and in-painted with a cross-bilateral filter using the official toolbox. Following [3, 13], the original frames of size 640×480 are first downsampled to half and then center-cropped, producing a final size of 304×228.

The depth maps used for training and testing are both GT depth(processed depth) obviously. Therefore, when I use raw depth in test set for sparse to dense prediction I will get terrible result due to the data distribution (not sure if that's the reason).

I’ve just investigated the depth completion researchs recently, as far as I know, the methods like S2D, CPSN and NLSPN both do not use the raw depth map. Is there anything wrong with my understanding?

zzangjinsun commented 2 years ago

NYU raw depth maps have holes and inaccurate values as described in the following figure: image

Moreover, depth completion works follow the training/evaluation scheme proposed in S2D, therefore, we use 200 or 500 points sampled from the GT depth map as an input.

Then why don't you check whether holes in raw depth have exactly 0 or some (erroneous) low pixel values?

If holes are not 0, evaluation results will be totally wrong: https://github.com/zzangjinsun/NLSPN_ECCV20/blob/ba33fa5d9ea62ca970026a145ab18fab76d79d4a/src/metric/nlspnmetric.py#L31-L40

mmmcn commented 2 years ago

ok, thanks for your advice, I will check it.