I test your tf-model metrics-error and the result can not reach 81%

JackHenry1992 commented 6 years ago

Hi, @iro-cp I have test your result on nyu_label dataset (use test 654 images, tf-model is provided by you) , http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat

There is my error calculate method, (a1 corresponding your delta in paper):

    thresh = np.maximum((gt / pred), (pred / gt))
    a1 = (thresh < 1.25   ).mean()
    a2 = (thresh < 1.25 ** 2).mean()
    a3 = (thresh < 1.25 ** 3).mean()

    rmse = (gt - pred) ** 2
    rmse = np.sqrt(rmse.mean())

    rmse_log = (np.log(gt) - np.log(pred)) ** 2
    rmse_log = np.sqrt(rmse_log.mean())

    abs_rel = np.mean(np.abs(gt - pred) / gt)

    sq_rel = np.mean(((gt - pred)**2) / gt)

The result (77%) is not good when compare with your paper . Did you test errors on tf-model ? Maybe there are some difference with my evaluate scipt. (network image feeded code is the same with your realeased code)

img = Image.open(image_path)
img_resize = img.resize([width,height], Image.ANTIALIAS)
img_resize = np.array(img_resize).astype('float32')
 img_resize_expend = np.expand_dims(np.asarray(img_resize), axis = 0)

irolaina commented 6 years ago

Hi, The original model was trained in matconvnet. We later converted its parameters to tensorflow. The predictions from the converted model were also tested by the matlab script that we provide and so the conversion was evaluated. We have not tried testing with a python script but I cannot imagine a reason why it should be different (except for Issue #42). I will look more into that.

ghost commented 6 years ago

That's exactly what I've got when testing the tensorflow model. Did you find out what's wrong?

lukasliebel commented 6 years ago

Hi,

page 7 of the original paper states:

Following [6], the original frames of size 640x480 pixels are down-sampled to 1/2 resolution and center-cropped to 304x228 pixels, as input to the network.

Did you realize that this part is missing in the provided TF code (probably because it's unrelated to the prediction of general images)? The corresponding part of the TF code directly scales the input image to 304x228:

height = 228
width = 304

# [...]

img = Image.open(image_path)
img = img.resize([width,height], Image.ANTIALIAS)
img = np.array(img).astype('float32')

I would recommend replacing this with something like:

height = 228
width = 304

# [...]

img = Image.open(image_path)
img = img.resize([320, 240], Image.ANTIALIAS)
img = np.array(img).astype('float32')
img = img[6:-6,8:-8,:]

This is not a very nice implementation but I hope you get my point.

I didn't predict and quantitatively evaluated the full NYU test set using both versions yet. However, for some NYU images I tested this on, the results differed visibly!

lukasliebel commented 6 years ago

I looked it up in our paper (see Table 5 in the supplementary material, page 14).

I was using the resize/crop procedure shown in my previous post, which yielded a final result of 0.8164 for the respective metric. Note that while the prediction was done using the modified TensorFlow implementation, the errors were calculated in Matlab. So there's still a chance your slightly differing results are caused by other Matlab/TF differences.

Hope this helps!

irolaina / FCRN-DepthPrediction

I test your tf-model metrics-error and the result can not reach 81% #45