Closed shuo-ouyang closed 3 years ago
No. We haven't found a way to do efficient encoding and decoding for low-bits quantization methods. We tried to do it by native Pytorch/Tensorflow API, however, the real throughput will actually be worse due to the computation overhead. If you really care about how much data it transfers, you may refer to our inefficient implementation here.
IMHO, the return type of function
tf.math.less
istf.bool
, which is representated by 8 bit. Hence, the code at grace_dl/tensorflow/compressor/onebit.py#L21 is not really one-bit quantization but eight-bit quantization?