submission2019 / cnn-quantization

Quantization of Convolutional Neural networks.
236 stars 59 forks source link

Advantage of the 4-bit Quantization #4

Open amitsrivastava78 opened 5 years ago

amitsrivastava78 commented 5 years ago

Hi @submission2019 , First of all i would like to congratulate you guys for coming up with this paper and opening the github project for the analysis. I have gone though your paper and github project deeply, and i would like to know the following : -

  1. What is the advantage of this approach over 8-bit quantization ? Since all the operation should be byte aligned that mean mathematical operations should least be 8-bit also storage part also seems to be 8-bit aligned so i can not understand where the advantage lies in doing 4-bit quantization ? Also i can see there is a drop of accuracy of about 2~3% compared to 8-bit quantization.

So may be there is a bigger picture which i am not able to see, can you guys please point me to the right direction.

Regards Amit

submission2019 commented 5 years ago

Hello. The advantage of 4-bit weights and activations due to 2x reduction in bandwidth. Lot's of neural network workloads are bandwidth bound, reducing amount of bits increase throughput and reduces power consumption.

Of course in order to benefit from 4-bit quantization we need dedicated HW that supports manipulations with resolution lower than byte(8bit). Some HW vendors already suggest experimental HW/features for enthusiasts to experiment with int4. For example NVidia added support of int4/uint4 datatype as part of Cuda10 TensorCores HW. On other hand a lot of academical and industrial research focusing on suggesting methods that will bring accuracy of int4 inference near to int8. The goal of our work to suggest and evaluation such methods that will allow int4 inference of convolutional neural networks with relatively small degradation of accuracy.

amitsrivastava78 commented 5 years ago

@submission2019 , @ynahshan , thanks for pointing me to the right direction. The paper looks promising , have you thought about commercializing this solution in any product? Also using your algo on mobilenet , accuracy is very less, can you throw some light on this.

Regards Amit

submission2019 commented 5 years ago

Hi. We didn't tried to apply our methods on mobilenet. I don't know what are the reason for poor results you observe. It could be related to depth wise convolution that mobilinet mostly consist. Unfortunately with diversity of deep learning models it often requires to analyse the model and fine tune quantization methods for specific model.

amitsrivastava78 commented 5 years ago

@submission2019 , @ynahshan thanks for the reply, i am closing this issue. If i plan to make mobile net accuracy better will post the code and method here as well.

Regards Amit

amitsrivastava78 commented 5 years ago

@submission2019 , thanks for the reply for the Mobilenet part, yes we are facing the same issue with mobilenetV2 of low accuracy. can you please describe in the measures you have take, for us the Top1 accuracy with 4-bit for mobilenet_v2 is coming to ~49%, can you please tell the exact steps for making it 70%.

Regards Amit

limerainne commented 5 years ago

Dear @amitsrivastava78,

In the previous comment, I made a mistake (accidently set bitwidth to 8bit) in the test which brought incorrect higher accuracy.

Sorry for wrong information and deleting my comment without proper notice.

P.S. For avoiding confusion (the authors were refered in your comment), I'm not related to authors.

jonathanbonnard commented 4 years ago

Hi, I have encountered the same problem in mobilenetv2 and I think I know where the problem is. In fact, the program is quantizing the 3rd sub layer (aka linear bottleneck) but it should not. The output of this sub layer has to be kept at 4bit + 4bits + log2(nb_out_channels) else the dynamic range range will be clipped and this leads to wrong input values for the next 1x1 channel. However, I don't know where the program should be modified to change this feature... Maybe authors can help?

ghost commented 4 years ago

Hi, I want to save the quantized model and analysis its metrics such as inference time, model size, FLOPs, and Parameters, can anyone give me some advice? Or have you already finish it? @amitsrivastava78 @submission2019 @limerainne @jonathanbonnard @ynahshan Thanks a lot!