submission2019 / cnn-quantization

Quantization of Convolutional Neural networks.
237 stars 59 forks source link

Any effect of batch size reduction? #3

Closed ANSHUMAN87 closed 5 years ago

ANSHUMAN87 commented 5 years ago

Hi Author, highly appreciate the project. Just wondering, that you have mentioned 512 as default batch size. Will there be any effect if i reduce the batch size to lets say 32 or less. Just wanted to know your observation on this point.

Also i have one query, you see i have used the sameple command given in landing page as below.

python inference/inference_sim.py -a resnet50 -b 512 -pcq_w -pcq_a -sh --qtype int4 -qw int4 -c laplace -baa -baw -bcw

The accuracy is as you have mentioned. But when i saved the model post validation. The model size stays almost same, could find only reduced by 80KB max. Is it same with your observation or i did something wrong?

ANSHUMAN87 commented 5 years ago

@submission2019 , @ynahshan : Would you please on above 2 queries?

submission2019 commented 5 years ago

Hi. We are using statistical methods to find optimal quantization parameters hence changing batch size could potentially affect results by having less data to correctly estimate standard deviation but this effect should be minor. You are welcome to try it and use whatever batch size you want > 32 but for best performance it is highly recommended to use maximal batch size your GPU can fit.

Pytorch saving model parameters as fp32 so you won't see any difference in model size saved to the disk. It is possible to save model as int4 but requires custom implementation that takes quantized parameters of the model and saves as int4 to some binary format + scale/shift values. We don't have this implemented in our code.

ANSHUMAN87 commented 5 years ago

@submission2019 : Thanks a lot for your response. It was really helpful. I have one more concern basically. Actually during execution i could see, there are some values which are quantized as 5-bit (based on values returned from int_quantizer.get_bits_alloc()). So i think there will be some channels in the same layer which are quantized as 5-bit while others as 4-bit. Would you please confirm on this point. If that is the case, how to handle such situation, as it cant have uniformity within layers which is difficult to handle in implementation even for storage and retrieval.

Eagerly waiting for your reply, Thanks!

submission2019 commented 5 years ago

This is correct. We introduce optimal bit allocation algorithm where each channel quantized using different number of bits. On average per tensor we should preserve 4 bit budget but some channels could be quantized to 5 bit and some to 3 bits. This approach minimizes overall MSE error due to quantization where channels with smaller variance require lower resolution. This feature can be enabled/disabled by specifying "--baa or --baw".