Usage of EMA model for exp:212

lhoyer / improving_segmentation_with_selfsupervised_depth

[CVPR21] Implementation of our work "Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation"

244 stars 30 forks source link

Usage of EMA model for exp:212 #10

Closed nbansal90 closed 3 years ago

nbansal90 commented 3 years ago

Hey @lhoyer ,

I had a small query about using ema_model for exp id: 212. We are using it during training the model for unsupervised semantic segmentation part, where we use it to generate pseudo labels.

    self.ema_model.use_pose_net = False
    logits_u_w = self.ema_model(unlabeled_inputs)["semantics"]
    softmax_u_w = torch.softmax(logits_u_w.detach(), dim=1)`

And then we are updating the ema_model after each step of training in the following lines: `

 if self.ema_model is not None:
        self.ema_model = self.update_ema_variables(ema_model=self.ema_model, model=self.model,
                                                   alpha_teacher=0.99, iteration=step)`

My questions were:

Why we are not saving the ema_model any point of time, as part of the experiment 212?
Why do we use the teacher (ema_model) to get the pseudo labels but not the current best model to get the pseudo labels?

Regards, Nitin

lhoyer commented 3 years ago

Hi Nitin,

The ema model is only used during the training to generate pseudo-labels. As it is not needed for inference, we don't save it. We use the ema model to generate pseudo-labels as it acts as a temporal ensemble and generates more stable pseudo-labels. We don't use the best model to generate the pseudo-labels as this would result in validation data being incorporated into the training process.

Best regards, Lukas

nbansal90 commented 3 years ago

Thanks @lhoyer for your prompt reply. I was only wondering since the ema_model is being used to guide the student model, during the self-supervised semantic step, the assumption here is that (and as you also point out) the ema model (teacher) is better in some sense than the student model.

If that is the case, why do we only save the student model, based on its performance on the validation set and not the ema model, since it might be a better (generalizable) model, compared to the student model.

lhoyer commented 3 years ago

The ema model is more temporally stable and its predictions do not fluctuate that much, which is useful for the self-training. On the other hand, it sometimes lacks some segmentation details and small/rare classes if I remember correctly. However, we did not further study this aspect and basically follow previous works such as CutMix and ClassMix.