Implementation of pre-training

liuzechun / Bi-Real-net

Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. In ECCV 2018 and IJCV

176 stars 39 forks source link

Implementation of pre-training #9

Open physics2014 opened 5 years ago

physics2014 commented 5 years ago

In the implementation of pre-training, you first trained a real-valued resnet to initialize the bnn with the same hyperparameter settings of the original resnet. The architecture of the Bi-Real-net and the standard ResNet is different, which one do you use for pre-training?

If you use the architecture of the standard resnet, is it efficient to load the pre-trained weights into the Bi-Real-net which has different inference graphs?

If you use the architecture of the standard the Bi-Real-net, does it work if you use the same hyperparameter settings of the original resnet because the pooling, batchnorm and activation layers are stacked differently in Binary CNN and CNN.

adrianloy commented 5 years ago

I am wondering the same thing. Have you done any experimetns @physics2014 ?

koenhelwegen commented 5 years ago

Also curious to this. For 18-layer resnet, if I use the same architecture and hyperparameters as in 18-layer/Bi-Real-net-18-solver.prototxt for full-precision pretraining, I get to ~49% top-1 accuracy, so significantly below the reported result. Anyone had more success?

daquexian commented 5 years ago

@koenhelwegen Is the ~49% top-1 got on float model or binary model initialized by float model?

koenhelwegen commented 5 years ago

The binary model initialized by the full precision model.

daquexian commented 5 years ago

I implemented and trained bi-real net 18 in pytorch, and got the ~56% top-1 even without the float precision model, but I haven't try this repo.

On Wed, Mar 6, 2019, 9:52 PM koenhelwegen notifications@github.com wrote:

The binary model initialized by the full precision model.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/liuzechun/Bi-Real-net/issues/9#issuecomment-470114743, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEcn4_CzQMxOf6SVAOToZ5kOtIxnqiGks5vT8gwgaJpZM4Y1gqT .

koenhelwegen commented 5 years ago

Interesting! What learning rate schedule did you use? Would you mind sharing the code?

daquexian commented 5 years ago

You can mail me ;)

On Wed, Mar 6, 2019, 10:32 PM koenhelwegen notifications@github.com wrote:

Interesting! What learning rate schedule did you use? Would you mind sharing the code?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/liuzechun/Bi-Real-net/issues/9#issuecomment-470118960, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEcn58pEI6Y2dMz-bebi6oCdtofm8aeks5vT9FjgaJpZM4Y1gqT .

ttccxx commented 5 years ago

@daquexian Would you mind sharing the code with me, too? ps: I have mailed you, but maybe you were too busy to notice it :-D

liuzechun commented 5 years ago

To pre-training the Bi-Real net 18, we use the same hyperparameter settings as the real-valued resnet. But we use the same architecture as Bi-Real Net 18 binary version, expect the binary-convolution layer is replaced with a convolution layer and the sign function is replaced with the ReLU function.