irolaina / FCRN-DepthPrediction

Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)
BSD 2-Clause "Simplified" License
1.11k stars 313 forks source link

I test your tf-model metrics-error and the result can not reach 81% #45

Open JackHenry1992 opened 6 years ago

JackHenry1992 commented 6 years ago

Hi, @iro-cp I have test your result on nyu_label dataset (use test 654 images, tf-model is provided by you) , http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat

image

There is my error calculate method, (a1 corresponding your delta in paper): image

    thresh = np.maximum((gt / pred), (pred / gt))
    a1 = (thresh < 1.25   ).mean()
    a2 = (thresh < 1.25 ** 2).mean()
    a3 = (thresh < 1.25 ** 3).mean()

    rmse = (gt - pred) ** 2
    rmse = np.sqrt(rmse.mean())

    rmse_log = (np.log(gt) - np.log(pred)) ** 2
    rmse_log = np.sqrt(rmse_log.mean())

    abs_rel = np.mean(np.abs(gt - pred) / gt)

    sq_rel = np.mean(((gt - pred)**2) / gt)

The result (77%) is not good when compare with your paper . Did you test errors on tf-model ? Maybe there are some difference with my evaluate scipt. (network image feeded code is the same with your realeased code)

img = Image.open(image_path)
img_resize = img.resize([width,height], Image.ANTIALIAS)
img_resize = np.array(img_resize).astype('float32')
 img_resize_expend = np.expand_dims(np.asarray(img_resize), axis = 0)
irolaina commented 6 years ago

Hi, The original model was trained in matconvnet. We later converted its parameters to tensorflow. The predictions from the converted model were also tested by the matlab script that we provide and so the conversion was evaluated. We have not tried testing with a python script but I cannot imagine a reason why it should be different (except for Issue #42). I will look more into that.

ghost commented 6 years ago

That's exactly what I've got when testing the tensorflow model. Did you find out what's wrong?

lukasliebel commented 6 years ago

Hi,

page 7 of the original paper states:

Following [6], the original frames of size 640x480 pixels are down-sampled to 1/2 resolution and center-cropped to 304x228 pixels, as input to the network.

Did you realize that this part is missing in the provided TF code (probably because it's unrelated to the prediction of general images)? The corresponding part of the TF code directly scales the input image to 304x228:

height = 228
width = 304

# [...]

img = Image.open(image_path)
img = img.resize([width,height], Image.ANTIALIAS)
img = np.array(img).astype('float32')

I would recommend replacing this with something like:

height = 228
width = 304

# [...]

img = Image.open(image_path)
img = img.resize([320, 240], Image.ANTIALIAS)
img = np.array(img).astype('float32')
img = img[6:-6,8:-8,:]

This is not a very nice implementation but I hope you get my point.

I didn't predict and quantitatively evaluated the full NYU test set using both versions yet. However, for some NYU images I tested this on, the results differed visibly!

lukasliebel commented 6 years ago

I looked it up in our paper (see Table 5 in the supplementary material, page 14).

I was using the resize/crop procedure shown in my previous post, which yielded a final result of 0.8164 for the respective metric. Note that while the prediction was done using the modified TensorFlow implementation, the errors were calculated in Matlab. So there's still a chance your slightly differing results are caused by other Matlab/TF differences.

Hope this helps!