yhhhli / APoT_Quantization

PyTorch implementation for the APoT quantization (ICLR 2020)
258 stars 51 forks source link

Why the size of Res20_2bit is the same as Res20_32bit? #10

Closed gogo03 closed 3 years ago

yhhhli commented 3 years ago

Hi, currently our implementation is fake_quantization, the quantized weights are still FP32 format in GPU, but they are restricted to several fixed numbers. To enable real quantization on GPU, it requires specific cuda implementation for low-bit computation, which is not necessary for researchers who just want to test their algorithm.