irolaina / FCRN-DepthPrediction

Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)
BSD 2-Clause "Simplified" License
1.11k stars 313 forks source link

Training details #32

Closed harsanyika closed 6 years ago

harsanyika commented 7 years ago

Hello!

I am trying to recreate your results on the NYU_depth dataset with Pytorch. I am fairly confident that my network structure, loss function, and data augmentation process is correct, but I am unable to reach a similar depth image quality as your Tensorflow outputs (see the attached images).

My guess is that the difference might be in the training process. I tried to work according to your article, but a few details are unclear. You wrote that you gradually reduce the learning rate when you observe plateaus. How do you define a plateau, and what does gradually means in this case?

To get the results below, I used SGD optimizer with 0.01 init LR and 0.9 momentum, and I halve the learning rate after every 7th epoch.

some test image: test2

the output using your Tensorflow network: your_results_with_tf_

the output using my Pytorch network: test2_res_2_

chrirupp commented 7 years ago

Hi, are you using maybe just the labeled subset of NYUDepth for training?

harsanyika commented 7 years ago

I am using ~12k images chosen evenly from the scenes in the unlabeled dataset, and I do online augmentation to make epochs with ~95k images.

chrirupp commented 7 years ago

Ok, what is the error that you get in the end?

harsanyika commented 7 years ago

The final average (on the batches) reverse Huber Loss is 0.1557 on the test set with 654 images (I do both the training and the evaluation in batches with size 16).

chrirupp commented 7 years ago

Can you also tell me the relative error and RMSE? Then it is much easier to compare. Make sure to compute these in the original resolution of 640x480 to be comparable to ours.

I am asking this, because it makes sense that for different (randomly initialized) trainings some predictions might be different. The overall performance could still be almost the same.

harsanyika commented 7 years ago

I am a little unsure about how you calculate the relative error and the RMSE in your article. I get that you leave the ground truth images in the validation set untouched (their size stays 640x480). But what about the RGB inputs? If I center-crop to get the right sizes (304x228), I lose some pixels on the sides, and even if the network works perfectly, the up-sampled (from 160x128 to 640x480) output and the ground truth won't be the same.

chrirupp commented 7 years ago

You can find the way we do this in the MATLAB code for evaluating NYU depth. Basically we downsample and remove a small border of invalid pixels, that come from warping the RGB to the depth image. But, since you just want to find out if your model performs similar to ours, you should apply the same evaluation to both models and the numbers should be comparable.

harsanyika commented 7 years ago

I just finished a longer training session, and this time I got better results (the losses are similar to yours). I think the blurriness of my predictions was caused by high learning rates. Now that I trained for more than 50 epochs, the learning rate shrank more, and the predicted depth images are much cleaner. I think I could achieve comparable results with higher learning rate decay from only 20 epochs.

JackHenry1992 commented 6 years ago

Hi, @iro-cp , I can't find the test images in nyu official website. Can you provide origin test_images used by your paper? As I think that only use the same test_images can validation inference. And it's important to unify the same standard.

zhangshaoyong commented 6 years ago

@harsanyika Hi, I am trying to recreate authorr results on the NYU_depth dataset with Tensorflow, I use that published test code directly as a build network. When training, I found the value of the network output is not between 0 and 1. So I don't know deal with groundtruth to calculate loss. Do you also encounter such problems, and how to deal with it? Thank you.

E-MHJ commented 6 years ago

Hi, @harsanyika I am looking for a training code as reference on my project, can you share your training code

chrirupp commented 6 years ago

seems like the original problem is solved. Please open a separate issue if you have further problems.

Ariel-JUAN commented 6 years ago

@harsanyika Hi, I I am trying to recreate the fcrn work, but I can't get a good results. Can you give me some advice about training details? Are you using the raw NYU images or fill the invalid pixels? I shut down the relu of the final layer. But when I set the lr as 0.01, I think the network learn nothing, even sometimes the initial loss will become so large or nan. Please give me some advice, please~

harsanyika commented 6 years ago

Hi. I am not sure, that I am the best person to give advice. My results are still somewhat worse than the results in the original article. Anyway, I was working with the raw images. If you have problems during training, try to overfit the network:

If your network architecture and your loss function are well defined, the network should learn these images almost perfectly. Otherwise, you will know that there is a problem.

irolaina commented 6 years ago

@Ariel-JUAN Hi, the learning rate depends on whether you divide your loss by the #pixels per image or not. Also on the loss function itself. With Berhu we could start with a higher learning rate than with L2. We did not fill in invalid pixels; instead, we mask them out during training.

jeslykp commented 5 years ago

Could you please tell how can we do the upsampling into original resolution