A question about one-bit quantization implementation in tensorflow backend.

sands-lab / grace

GRACE - GRAdient ComprEssion for distributed deep learning

https://sands.kaust.edu.sa/project/grace/

BSD 2-Clause "Simplified" License

133 stars 45 forks source link

A question about one-bit quantization implementation in tensorflow backend. #12

Closed shuo-ouyang closed 3 years ago

shuo-ouyang commented 3 years ago

IMHO, the return type of function tf.math.less is tf.bool, which is representated by 8 bit. Hence, the code at grace_dl/tensorflow/compressor/onebit.py#L21 is not really one-bit quantization but eight-bit quantization?

hangxu0304 commented 3 years ago

No. We haven't found a way to do efficient encoding and decoding for low-bits quantization methods. We tried to do it by native Pytorch/Tensorflow API, however, the real throughput will actually be worse due to the computation overhead. If you really care about how much data it transfers, you may refer to our inefficient implementation here.