How to train/eval with the network (BNext-Tiny or BNext_quant-Tiny)

Kso-Ayaka commented 1 year ago

Dear authors, thanks for your excellent work. I am just getting into the binary neural networks. When I use the code for debugging, I find that the parameters are still of type float-32 and the code(bnext.py or bnext_quant.py) cannot be run after setting: quant=True. I would like to ask if there is something wrong with my operation or if the code needs to be adjusted. Thank you so much.

NicoNico6 commented 1 year ago

Hi, thanks for your attention.

During the training phase, the weights and activations work in a simulation binarization manner. This means that the data type is still float-32 for GPUs even though the value already equals -1 or +1. This is because existing deep learning frameworks like pytorch or tensorflow only support 32/16-bit convolution for training. Therefore, the existing binary or quantization community uses the simulation quantization technique to optimize the models.

As we mentioned in the paper, the weights in binary convolution are binarized progressively during the training, therefore we can see that the weights are full precision at the beginning but end with binary (+1, -1). You may refer to our paper and [1] for more details about this part.

If you do not wanna train the model from scratch, you can reload the pretrained BNext-Tiny model, where the temperature parameter is already decreased to near 0 and the absolute value of weights are equal to 1.

As for the "quant", we have provided the quant version architecture for better understanding, but the fine-tuning implementation for extra quantization is not uploaded yet. The final version will be updated after code cleaning. But I believe this is not hard to re-implement, since we only use very typical training-aware quantization optimization for the extra post-quantization.

[1] Guo, N., Bethge, J., Yang, H., Zhong, K., Ning, X., Meinel, C., & Wang, Y. (2021). Boolnet: minimizing the energy consumption of binary neural networks. arXiv preprint arXiv:2106.06991.

Kso-Ayaka commented 1 year ago

Hi, thanks for your attention.

During the training phase, the weights and activations work in a simulation binarization manner. This means that the data type is still float-32 for GPUs even though the value already equals -1 or +1. This is because existing deep learning frameworks like pytorch or tensorflow only support 32/16-bit convolution for training. Therefore, the existing binary or quantization community uses the simulation quantization technique to optimize the models.

As we mentioned in the paper, the weights in binary convolution are binarized progressively during the training, therefore we can see that the weights are full precision at the beginning but end with binary (+1, -1). You may refer to our paper and [1] for more details about this part.

If you do not wanna train the model from scratch, you can reload the pretrained BNext-Tiny model, where the temperature parameter is already decreased to near 0 and the absolute value of weights are equal to 1.

As for the "quant", we have provided the quant version architecture for better understanding, but the fine-tuning implementation for extra quantization is not uploaded yet. The final version will be updated after code cleaning. But I believe this is not hard to re-implement, since we only use very typical training-aware quantization optimization for the extra post-quantization.

[1] Guo, N., Bethge, J., Yang, H., Zhong, K., Ning, X., Meinel, C., & Wang, Y. (2021). Boolnet: minimizing the energy consumption of binary neural networks. arXiv preprint arXiv:2106.06991.

Thanks so much for your patient answer! Now I understand how a binary network is trained!

Kso-Ayaka commented 1 year ago

Sorry to bother again. I tried to download and load the BNext-Tiny, but still can not understand why the parameters are like this:

question

I am not sure if i misunderstood this sentence: ''If you do not wanna train the model from scratch, you can reload the pretrained BNext-Tiny model, where the temperature parameter is already decreased to near 0 and the absolute value of weights are equal to 1.''

NicoNico6 commented 1 year ago

Currently, when we say binary neural network, we mean that all the computation-heavy layers except the input and output layers are binarized into binary. But the other layers like batch normalization, PReLU are left still full precision or higher bit than 1, considering the low computation cost of them.

To optimize this kind of network using SGD algorithm, the existing community uses the straight-through-estimation technique for backpropagation, and the gradients are accumulated in full precision proxy weights during the backpropagation. But in each forward propagation, the proxy weights are re-binarized through sign function for simulation binarization.

Therefore, simply using Summary can not get the "binary representation", instead you can get the "proxy weight" information.

If you want to see the binary representation, please check the forward function https://github.com/hpi-xnor/BNext/blob/ab5a175bce2073cb27e305d9a0ad4851dd5b12bc/src/bnext.py#L107-L116 instead.

you may also need to read some classical BNN papers to know more about the details.

Kso-Ayaka commented 1 year ago

Thanks for your answer and advice!

hpi-xnor / BNext

How to train/eval with the network (BNext-Tiny or BNext_quant-Tiny) #1