Open cheerss opened 4 years ago
I have the same question 0.0
Hi, I think i have known why there are no BN layers in teacher structure,
"Folded Models below have batch_norm parameters and statistics folded into convolutional layers for speed. It is not recommended to use them for finetuning."
Thanks for your great work first!
I wonder why you do not use BN layer when inference the teacher model here( https://github.com/szagoruyko/attention-transfer/blob/master/imagenet.py#L117 )? Is it a typo?
Hope for your reply!