Reproducing work in Literature + general questions

Dear Victoria

First of all, thank you for this work. I am a beginner in Deep Learning trying to work with Depth Estimation. Here are my questions:

I'm trying to reproduce work of Godard et al and Kuzneitsov et al. Could you guide on how to modify your models (DispNet and ResNet respectively) to obtain networks used by them?
Could you elaborate/explain on the terms 'disp_norm' and 'upscaling' used in losses.py? Among the papers cited by you, which ones use these terms?
In losses.py, line 45, should loss_dict['loss_DS'] = depth_supervised_loss(depth_r[0], var_dict_t['gt_depth_r'], loss_params_dict) instead be loss_dict['loss_DS'] **+=** depth_supervised_loss(depth_r[0], var_dict_t['gt_depth_r'], loss_params_dict)
Any reason why you didn't implement Appearance Matching loss (Eq. (2) in Godard's paper). Does 'w_RL' flag in loss dict represent this loss?

Hi!

Kuznezov's ResNet50 model is the same as the ResNet50 model in the given repo. Godard uses VGG and ResNet50 models , but his ResNet50 model is a little bit different, since he uses a different from Kuznetzov number of channels in the upsampling part. You can modify the ResNet50 model in this repo by changing the number of filters to get Godard's model.
Godard in "Digging Into Self-Supervised Monocular Depth Estimation" (in the previous version of the paper 2018.06_Godard - Digging Into Self-Supervised Monocular Depth Estimation.pdf, the new version is different) uses the depth normalization trick (the 'disp_norm' flag):

d_t = d_t/mean(d_t), where d_t is the mean-normalized inverse depth for I t as used by Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods.

and the upsampling trick (the 'upscaling' trick):

We propose an improvement to this multi-scale formulation. Instead of com- puting the photometric error on the ambiguous low-resolution images, we first upsample the lower resolution depth maps to the input image resolution and then warp and compute the photometric error pe at this higher input resolu- tion. This effectively constrains the depth maps from each resolution to work towards the exact same objective i.e. reconstructing the high resolution input target image as accurately as possible. We found that this significantly improves the depth accuracy, while also reducing the texture-copy artifacts which are very noticeable in the previous multi-scale formulation as can be seen in Fig. 5. This is related to matching patches, a standard practice in stereo reconstruction [40], as a low-resolution disparity value will be responsible for reprojecting an entire patch of pixels in the high resolution image.

I have found that they do not help in the regression network.

There is no need for the summation, since this loss is calculated for a single batch. If there are several losses, they are added together in the for loop: loss += loss_weights_dict['w_{}'.format(loss_name[5:])] * loss_value
I have not implemented the SSIM loss because it is usually used in the self-supervised case (in addition to an L1 loss), but not in regression with L2 loss.

victoriamazo / depth_regression

Reproducing work in Literature + general questions #1