Closed nbansal90 closed 3 years ago
Hi Nitin,
The ema model is only used during the training to generate pseudo-labels. As it is not needed for inference, we don't save it. We use the ema model to generate pseudo-labels as it acts as a temporal ensemble and generates more stable pseudo-labels. We don't use the best model to generate the pseudo-labels as this would result in validation data being incorporated into the training process.
Best regards, Lukas
Thanks @lhoyer for your prompt reply. I was only wondering since the ema_model
is being used to guide the student model, during the self-supervised semantic step, the assumption here is that (and as you also point out) the ema model (teacher
) is better in some sense than the student model.
If that is the case, why do we only save the student model, based on its performance on the validation set and not the ema model
, since it might be a better (generalizable) model, compared to the student model.
The ema model is more temporally stable and its predictions do not fluctuate that much, which is useful for the self-training. On the other hand, it sometimes lacks some segmentation details and small/rare classes if I remember correctly. However, we did not further study this aspect and basically follow previous works such as CutMix and ClassMix.
Hey @lhoyer ,
I had a small query about using
ema_model
for exp id:212
. We are using it during training the model for unsupervised semantic segmentation part, where we use it to generate pseudo labels.`
And then we are updating the
ema_model
after each step of training in the following lines: `My questions were:
ema_model
any point of time, as part of the experiment 212?ema_model
) to get the pseudo labels but not the current best model to get the pseudo labels?Regards, Nitin