Model size and test speed for help

shanyang0509 commented 1 year ago

Sorry to disturb you again. As I use the script "run_distributed_on_disk_a6k5_AdamW_Curicullum_Large_assistant_teacher_num_3_aa.sh" offered to train, I get the model size is 3715.30M and the pretrained Bnext_large model size is 1246.96M. Is there something wrong with me, can you help me ?Forthermore, the table of paper say that the BNext-L param is 106.1M, what is the matter.

There is other problem ： how can I test the quant model speed on the cpu. can you give me same advices? Thank you so much！

yanghaojin commented 1 year ago

Dear Shan Yang,

I will try to answer the second question. You cannot easily test the speed using existing open-source software toolkit. But we are working on it. We plan to support for both CPU and GPU hardware, and please stay tuned.

NicoNico6 commented 1 year ago

Hi Shan Yang,

For your first question: you can check the code here (https://github.com/hpi-xnor/BNext/blob/dfcf347a30e3bc08606b8cad2c8d4a329d5a5b28/src/train_assistant_group_amp.py#L648-L662), we save not only the model state_dict, but also optimizer state_dict and the training procedure information, which explain why the checkpoint size is way larger than model size.

For your second question: The existing model is still saved using torch.save() function, which only supports 32-bit representation. In this case, it is impossible to directly get a 106.1M BNext-L using the torch library, even though all weights in HardBinaryConv are represented as +1&-1. We plan to support a BNN-specific torch extension toolkit in the near future, please stay tuned.

shanyang0509 commented 1 year ago

Thanks for your answers!

yanghaojin commented 3 months ago

please check the binary layers implemented in bitorch-engine.

hpi-xnor / BNext

Model size and test speed for help #4