Can you provide pre-trained ResNet-18 model ?

sijeh commented 4 years ago

Hello， Thanks for your contribution in network quantzition field and your opensource code, I meet some problem in training ResNet18 model (quantizing both weights and activation to 4 bit ) on ImageNet dataset, which final best accuracy is about 68.1% . I kept all the hyper-parameters same with the code except for batch_size due to the GPU capacity, 3 RTX2080Ti are used to training and the batch_size is set as 196. I wonder if something was wrong in my training and I‘ll appreciate if you can provide pre-trained ResNet-18 model to help finding the problem.

yhhhli commented 4 years ago

Hello sijeh, Sorry for the late reply and thanks for your advice, I am planning to revise the code and provide the resnet-18 checkpoints. I will comment to you when the code is updated and the checkpoints are uploaded.

sijeh commented 4 years ago

Thx.

yhhhli commented 4 years ago

Hi sijeh, I just uploaded the ckpts for the 4bit ResNet-18 and the new codes for the APoT Quantization! Here are the changes:

quantization function now does not manually overwrite the gradients
You can specify the bitwidth when initializing the model
Checkpoints for 4-bit, 3-bit ResNet-18 are uploaded. More ckpts will come in a few days
Tensorboard log file is also provided in the events dir
New Hyperparams: Please note that we have changed the hyper-params configuration. For ResNet-18-4bit, LR is set to 0.01 and scaled by 0.1 for all parameters including clipping thresholds. Weight decay is set to 1e-4 for all parameters. Please also note that 4-bit model are initialized by 5-bit quantized model. (If you do not pre-train a 5-bit model, it's fine to directly initialize it from full precision model. However, we recommend you to progressively initialize the low-bit model, e.g. 2-bit).

Regarding your question about batch size: Theoretically speaking, LR is proportional to the batch size because lower batch size causes more training iterations. Therefore, you may use 0.01*192/1024 as your base LR.

If you still have further question, please do not hesitate to comment here.

sijeh commented 4 years ago

Hi yhhhli, Thanks for your detailed reply and updates of the opensource code and pretrained model, all the things going to be right since I re-downloaded and unzip ImageNet dataset.

yhhhli / APoT_Quantization

Can you provide pre-trained ResNet-18 model ? #1