Open physics2014 opened 5 years ago
I am wondering the same thing. Have you done any experimetns @physics2014 ?
Also curious to this. For 18-layer resnet, if I use the same architecture and hyperparameters as in 18-layer/Bi-Real-net-18-solver.prototxt for full-precision pretraining, I get to ~49% top-1 accuracy, so significantly below the reported result. Anyone had more success?
@koenhelwegen Is the ~49% top-1 got on float model or binary model initialized by float model?
The binary model initialized by the full precision model.
I implemented and trained bi-real net 18 in pytorch, and got the ~56% top-1 even without the float precision model, but I haven't try this repo.
On Wed, Mar 6, 2019, 9:52 PM koenhelwegen notifications@github.com wrote:
The binary model initialized by the full precision model.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/liuzechun/Bi-Real-net/issues/9#issuecomment-470114743, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEcn4_CzQMxOf6SVAOToZ5kOtIxnqiGks5vT8gwgaJpZM4Y1gqT .
Interesting! What learning rate schedule did you use? Would you mind sharing the code?
You can mail me ;)
On Wed, Mar 6, 2019, 10:32 PM koenhelwegen notifications@github.com wrote:
Interesting! What learning rate schedule did you use? Would you mind sharing the code?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/liuzechun/Bi-Real-net/issues/9#issuecomment-470118960, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEcn58pEI6Y2dMz-bebi6oCdtofm8aeks5vT9FjgaJpZM4Y1gqT .
@daquexian Would you mind sharing the code with me, too? ps: I have mailed you, but maybe you were too busy to notice it :-D
To pre-training the Bi-Real net 18, we use the same hyperparameter settings as the real-valued resnet. But we use the same architecture as Bi-Real Net 18 binary version, expect the binary-convolution layer is replaced with a convolution layer and the sign function is replaced with the ReLU function.
In the implementation of pre-training, you first trained a real-valued resnet to initialize the bnn with the same hyperparameter settings of the original resnet. The architecture of the Bi-Real-net and the standard ResNet is different, which one do you use for pre-training?
If you use the architecture of the standard resnet, is it efficient to load the pre-trained weights into the Bi-Real-net which has different inference graphs?
If you use the architecture of the standard the Bi-Real-net, does it work if you use the same hyperparameter settings of the original resnet because the pooling, batchnorm and activation layers are stacked differently in Binary CNN and CNN.