lhoyer / DAFormer

[CVPR22] Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Other
466 stars 91 forks source link

Questions about whether the teacher network is trained or not #59

Closed Mamduh-k closed 1 year ago

Mamduh-k commented 1 year ago

Dear author, is the ema_model trainable except for updating through ema?

Mamduh-k commented 1 year ago

Also, I would like to know why there is no gradient backpropagation to the teacher network.

lhoyer commented 1 year ago

The teacher is only upgraded by the exponential moving average update of the network weights to implement a temporal ensemble. Therefore, there are no gradients backpropagated into the teacher. Please, have a look at https://arxiv.org/pdf/1703.01780.pdf for further details.

Mamduh-k commented 1 year ago

Thank you for your answer, but I wonder the teacher network does not seem to have gradient back into it during training, so why is "detach()" needed?

lhoyer commented 1 year ago

The mix_loss.backward() uses pseudo-labels predicted by the teacher network. Without the detach(), the gradients could be backpropagated into the teacher.