In the Quantize function (binarizedmodules.py, line 57), I don't quite understand why the range for tensor.clamp() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2(numBits-1)).round().div(2(numBits-1)) gives me 127.1250. How is that stored in 8 bits?
In the Quantize function (binarizedmodules.py, line 57), I don't quite understand why the range for tensor.clamp() is from -128 to 128 if I want to quantize them with numBits=8. Since all the outputs from previous layers go through a Hardtanh function, should they be in the range [-1, 1] instead? Also, how are they converted to 8 bits if they are in the range [-128, 128]? e.g. if the input tensor is 127.125 and numBits=8, tensor.mul(2(numBits-1)).round().div(2(numBits-1)) gives me 127.1250. How is that stored in 8 bits?