Closed shanyang0509 closed 3 months ago
Dear Shan Yang,
I will try to answer the second question. You cannot easily test the speed using existing open-source software toolkit. But we are working on it. We plan to support for both CPU and GPU hardware, and please stay tuned.
Hi Shan Yang,
For your first question: you can check the code here (https://github.com/hpi-xnor/BNext/blob/dfcf347a30e3bc08606b8cad2c8d4a329d5a5b28/src/train_assistant_group_amp.py#L648-L662), we save not only the model state_dict, but also optimizer state_dict and the training procedure information, which explain why the checkpoint size is way larger than model size.
For your second question: The existing model is still saved using torch.save() function, which only supports 32-bit representation. In this case, it is impossible to directly get a 106.1M BNext-L using the torch library, even though all weights in HardBinaryConv are represented as +1&-1. We plan to support a BNN-specific torch extension toolkit in the near future, please stay tuned.
Thanks for your answers!
please check the binary layers implemented in bitorch-engine.
Sorry to disturb you again. As I use the script "run_distributed_on_disk_a6k5_AdamW_Curicullum_Large_assistant_teacher_num_3_aa.sh" offered to train, I get the model size is 3715.30M and the pretrained Bnext_large model size is 1246.96M. Is there something wrong with me, can you help me ?Forthermore, the table of paper say that the BNext-L param is 106.1M, what is the matter.
There is other problem : how can I test the quant model speed on the cpu. can you give me same advices? Thank you so much!