Closed Zuriich closed 5 years ago
I have forgot to mention that the data used for training are from NYU V2 dataset, I have aligned the RGB-D data with the matlab toolbox. From this data I took 12k images and applied the data augmentation.
Do you maybe normalize your training data and not your validation data? We train our network in meters not millimeters for example.
No, we don't do any type of normalization, and our network is completely in millimeters.
Hi! Can you please share your training code and tell how you solved this problem https://github.com/iro-cp/FCRN-DepthPrediction/issues/50 ? Thank you in advance.
@Zuriich can you show plot of training and validation loss? If it works for training but not for validation there has to be a significant difference between the two sets or the training method and the validation.
Sorry for the delayed response @chrirupp . I have solved the problem with the data, in fact the data where normalized (but not on purpose). But now I have another problem with the output data, the results have a small distance error comparing with the GT but the geometry of the scene is not good.
The plot of our training and validation loss:
- Red Line : Train
+ Green Line: Validation
And the new output of the network:
Overview:
Any idea what we're missing?
@Zuriich Hi :) From the images you attached I believe there is a problem with the ground truth depth maps. It looks like you interpolate between valid and invalid pixels and this can cause confusion during training, especially around borders. You should switch to nearest neighbor interpolation of the depth maps when resizing or doing augmentations to overcome this problem (and mask out all invalid pixels to prevent them from contributing to the loss). You will not have this "frame" artifact in your prediction then.
Hi @iro-cp :) Ok, I'm gonna try the nearest neighbour interpolation and send you the outputs. With mask out the invalid pixels you mean ignore them completely or fill them? should I do that during training?
By masking out, we mean ignoring them during training, such that they do not contribute to the loss function and therefore to weight updates.
what is this?
@Kimmins : It' s a colobar that's only there because I wanted the figure to be squared. It's a placeholder, nothing more.
@Zuriich What's the meaning of these figures(such as 50, 100,150, 200 etc.) ?
They are the colobars corresponding to each depth map, so each number is the corresponding distance,. In this case, at lighter color greater distance. In the figure you have the ground truth predicted from the Kinect camera, de predicted depth map from this algorithm and the difference between them (the blue one).
Thank you
张金铭 | |
---|---|
邮箱:jinmingzhang2017@163.com |
签名由 网易邮箱大师 定制
On 06/19/2019 18:48, Zuri Bauer wrote:
They are the colobars corresponding to each depth map, so each number is the corresponding distance,. In this case, at lighter color greater distance. In the figure you have the ground truth predicted from the Kinect camera, de predicted depth map from this algorithm and the difference between them (the blue one).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi,
I am trying to replicate the output of your network, I have created my training code based on the details your provide in the paper (data augmentation, learning rate init. 0.01, pretrained resnet50 weights, ignoring the invalid pixels, implemented the inv. huber loss, batch size 16 and 20 epoch). The results I got from the network look like this:
As you can see, the prediction of the network and the real GT look very similar from the geometrical perspective, but it seems they are in a different scale or the network has learned from images with very low resolution and has done an upsampling (because the values are the black ones 990-999 mm and the clearest ones have values of 199X).
If anyone can bring some light to this please. Best regards.