Open Rasmuskh opened 3 years ago
Hi Rasmuskh,
Thank you for this question, which is an interesting observation.
For the VGG-like network, we simply use the same network structure as the following paper: Alizadeh, Milad, et al. "An empirical study of binary neural networks' optimization." ICLR2018.
The code of Alizadeh et al 2018 can be found here.
We did not make a detailed analysis of the network structure and just use the same one as Alizadeh et al 2018 for ease of comparison. Intuitively, it might be that the final output value without normalization is not suitable for the loss function used, e.g., the absolute magnitue is too large due to the constraint of the binary weights. This can be checked by plotting the histograms of the output value without normalization and compare it with that of the BN output. Hope this conjecture will be helpful.
Best regards, Xiangming
Thank you, That is very helpful :) I will have a look at the Alizadeh paper.
Hi, I noticed that you add a batchnorm layer as the final layer of your VGG-like network. Could you explain why this is necessary?
I am using your code to train a ResNet18 model using your BayesBinn optimizer, and noticed that it is necessary to add batchnorm at the output layer for this model as well in order to achieve good performance (with batchnorm at the output layer it performs very well).