Reproducing Monodepth2-Self

Hi,

I want to apply your self-teaching approach to a different dataset and thus have to train it myself. In order to make sure that I get everything right, I am trying to reproduce your results on KITTI first, but so far without success.

First, I generated the teacher ground-truth using a modified test_simple.py, which stores the predicted depth for all 4 scales as .npy files:

python test_simple.py --image_path KITTI_RAW_DATA_PATH --image_file splits/eigen_zhou/train_files.txt --out_folder KITTI_TEACHER_DATA_PATH --model_name mono_640x192 --pred_depth

(Repeated the same with val_files.txt, of course.)

I then trained the student network using the following loss function:

def compute_loss(self, inputs, outputs, use_sigmoid=True):

    final_loss = 0

    losses = {}

    for scale in range(self.num_scales):

        depth_teacher = inputs[("pred_depth", scale)]

        _, depth_student = layers.disp_to_depth(outputs[("disp", scale)], self.min_depth, self.max_depth)

        if use_sigmoid:
            sigma = torch.sigmoid(outputs[("uncert", scale)]) + 1e-6
            log_sigma = torch.log(sigma)
        else:
            log_sigma = outputs[("uncert", scale)]
            sigma = torch.exp(log_sigma) + 1e-6

        l1_loss = torch.abs(depth_student - depth_teacher.cuda())
        loss = l1_loss / sigma + log_sigma

        losses[("l1_loss", scale)] = torch.mean(l1_loss)
        losses[("loss", scale)] = torch.mean(loss)

        final_loss += losses[("loss", scale)]

    final_loss /= self.num_scales

    losses["loss"] = final_loss

    return losses

With use_sigmoid=False, I get NaN loss in the last epoch. Evaluation of the second to last checkpoint (weights_18) on eigen_benchmark yields:

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.136  &   1.300  &   4.833  &   0.186  &   0.857  &   0.955  &   0.981  \\

   abs_rel |          |     rmse |          |       a1 |          | 
      AUSE |     AURG |     AUSE |     AURG |     AUSE |     AURG | 
&   0.029  &   0.062  &   0.497  &   3.583  &   0.032  &   0.090  \\

With use_sigmoid=True, training finishes without a problem and the last checkpoint (weights_19) yields:

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.131  &   1.200  &   4.769  &   0.184  &   0.857  &   0.958  &   0.983  \\

   abs_rel |          |     rmse |          |       a1 |          | 
      AUSE |     AURG |     AUSE |     AURG |     AUSE |     AURG | 
&   0.029  &   0.058  &   0.491  &   3.541  &   0.034  &   0.088  \\

This is still far from what I get with your pre-trained network. Any idea where the problem might be? I am using PyTorch 1.10.1 instead of 0.4, since that old version is not compatible with newer GPUs, but otherwise I am unable to find differences to what you have described in your paper and published code. Training the original Monodepth2 with the newer PyTorch version also gives nearly identical results to the published ones (e.g. AbsRel 0.090 -> 0.091, RMSE 3.942 -> 3.993), so I doubt that this would be a problem.

Thanks!

mattpoggi / mono-uncertainty

Reproducing Monodepth2-Self #20