mit-han-lab / haq

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
https://hanlab.mit.edu/projects/haq/
MIT License
365 stars 85 forks source link

Regarding paper and codes #7

Open yzhang93 opened 4 years ago

yzhang93 commented 4 years ago

By diving deep into the codes and the paper, I have two questions.

  1. I've read from the paper that "If the current policy exceeds our resource budget (on latency, energy or model size), we will sequentially decrease the bitwidth of each layer until the constraint is finally satisfied." Where in the codes correspond to this statement "decrease the bitwidth of the layer when the current policy exceeds budget?"

  2. Why don't you use the k-means quantization for latency/energy constraint experiments? Will you release codes for linear quantization?

haibao-yu commented 4 years ago

Hi, I also find the second question. And Did you reappear the quantization method? I reappear the quantization method based on cifar10+resner20 as 3.4 of the paper; however, this linear quantization method didn't work.

lydiaji commented 4 years ago

I find that the codes using k-means quantization while in the paper it says find the optimal clip value to minimize the KL divergence between non-quantized and quantized weight/activation, in the paper it means the linear quantization, which is different as shown in the codes.

mepeichun commented 4 years ago

I find that the codes using k-means quantization while in the paper it says find the optimal clip value to minimize the KL divergence between non-quantized and quantized weight/activation, in the paper it means the linear quantization, which is different as shown in the codes.

This confuses me as well. The paper uses linear quantization, but the code provides k-means quantization (similar to the "deep compression"). After k-means quantization, we cannot guarantee that the weights are fixed point arithmetic units.

lcmeng commented 4 years ago

It's quite unfortunate that the main novelty claimed by the paper, i.e., the use of direct hardware feedback, is conveniently missing in this repo. In fact, even the paper failed to provide a clear explanation on that claim.

kuan-wang commented 4 years ago

We have updated the linear quantization as well as the hardware resource-constrained part in this repo. Please let us know if you have any questions.

lcmeng commented 4 years ago

Can you please point to the part where the direct HW feedback is used? Thanks. Without that, the repo is still quite limited in significance.

kuan-wang commented 4 years ago

Thanks for your feedback! You can view the related code refer to https://github.com/mit-han-lab/haq/blob/7141586e9ae47c8a50aa8b596ab37682a06b434a/lib/env/linear_quantize_env.py#L306