irolaina / FCRN-DepthPrediction

Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)
BSD 2-Clause "Simplified" License
1.11k stars 313 forks source link

Output after training #55

Closed Zuriich closed 5 years ago

Zuriich commented 6 years ago

Hi,

I am trying to replicate the output of your network, I have created my training code based on the details your provide in the paper (data augmentation, learning rate init. 0.01, pretrained resnet50 weights, ignoring the invalid pixels, implemented the inv. huber loss, batch size 16 and 20 epoch). The results I got from the network look like this:

figure_1

As you can see, the prediction of the network and the real GT look very similar from the geometrical perspective, but it seems they are in a different scale or the network has learned from images with very low resolution and has done an upsampling (because the values are the black ones 990-999 mm and the clearest ones have values of 199X).

If anyone can bring some light to this please. Best regards.

Zuriich commented 6 years ago

I have forgot to mention that the data used for training are from NYU V2 dataset, I have aligned the RGB-D data with the matlab toolbox. From this data I took 12k images and applied the data augmentation.

chrirupp commented 6 years ago

Do you maybe normalize your training data and not your validation data? We train our network in meters not millimeters for example.

Zuriich commented 6 years ago

No, we don't do any type of normalization, and our network is completely in millimeters.

Antonyo314 commented 6 years ago

Hi! Can you please share your training code and tell how you solved this problem https://github.com/iro-cp/FCRN-DepthPrediction/issues/50 ? Thank you in advance.

chrirupp commented 6 years ago

@Zuriich can you show plot of training and validation loss? If it works for training but not for validation there has to be a significant difference between the two sets or the training method and the validation.

Zuriich commented 6 years ago

Sorry for the delayed response @chrirupp . I have solved the problem with the data, in fact the data where normalized (but not on purpose). But now I have another problem with the output data, the results have a small distance error comparing with the GT but the geometry of the scene is not good.

The plot of our training and validation loss: photo_2018-03-12_19-12-06

- Red Line : Train
+ Green Line: Validation

And the new output of the network: figure_1-8

Overview:

Any idea what we're missing?

irolaina commented 6 years ago

@Zuriich Hi :) From the images you attached I believe there is a problem with the ground truth depth maps. It looks like you interpolate between valid and invalid pixels and this can cause confusion during training, especially around borders. You should switch to nearest neighbor interpolation of the depth maps when resizing or doing augmentations to overcome this problem (and mask out all invalid pixels to prevent them from contributing to the loss). You will not have this "frame" artifact in your prediction then.

Zuriich commented 6 years ago

Hi @iro-cp :) Ok, I'm gonna try the nearest neighbour interpolation and send you the outputs. With mask out the invalid pixels you mean ignore them completely or fill them? should I do that during training?

irolaina commented 6 years ago

By masking out, we mean ignoring them during training, such that they do not contribute to the loss function and therefore to weight updates.

Kimmins commented 5 years ago

image what is this?

Zuriich commented 5 years ago

@Kimmins : It' s a colobar that's only there because I wanted the figure to be squared. It's a placeholder, nothing more.

Kimmins commented 5 years ago

@Zuriich What's the meaning of these figures(such as 50, 100,150, 200 etc.) ?

Zuriich commented 5 years ago

They are the colobars corresponding to each depth map, so each number is the corresponding distance,. In this case, at lighter color greater distance. In the figure you have the ground truth predicted from the Kinect camera, de predicted depth map from this algorithm and the difference between them (the blue one).

Kimmins commented 5 years ago

Thank you

张金铭

邮箱:jinmingzhang2017@163.com |

签名由 网易邮箱大师 定制

On 06/19/2019 18:48, Zuri Bauer wrote:

They are the colobars corresponding to each depth map, so each number is the corresponding distance,. In this case, at lighter color greater distance. In the figure you have the ground truth predicted from the Kinect camera, de predicted depth map from this algorithm and the difference between them (the blue one).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.