Closed AdamHillier closed 4 years ago
I roughly implemented the required changes in the converter which was rather painless. Though for the conversion patterns when reading and writing bitpacked output of the bconv
it might make sense to discuss some of the implementation details beforehand.
We briefly discussed the three possible implementations before, but I think it might be useful to have them explicitely written down here for reference:
The main idea here is that we introduce an quantize
and dequantize
op to replace the current bsign
(alternative names would be bsign/dequantize
, binarize/dequantize
, pack_bits/unpack_bits
or pack/unpack
, ...).
The signature of the ops would be {float32, int8} --> {int32}
and {int32} -> {float32, int8}
for quantize
and dequantize
respectively.
This is very close to how TFLite and MLIR handle int8 quantization and would be how the converter will handle it internally in any case, regardless of of which possibility we pick for the kernels.
While this would introduce a new op in LCE, the only other downside I see is that float32 -> float32
binarization would be potentially slower as it would become float32 -> int32 --> float32
. In general I don't think this is too problematic as float32 -> float32
binarization is not really useful in practice anyway.
This option would combine the previous approach in a single bsign
op with the signature {float32, int8, int32} --> {float32, int8, int32}
. While this makes the kernel implementation a bit more complicated it would remove the need to add a new op.
This is the option wich is closest to the current apprach where we would add the possibility to output a bitpacked tensor, so that the signature of the bsign
op would become {float32, int8} --> {float32, int8, int32}
. With this option we need to be a bit careful in the converter and need to make sure to only convert tfl.maxpool_2d --> lq.bsign
to lq.bsign -> lq.bmaxpool_2d
in cases where it is followed by a bconv
op that supports bitpacked int32
input since otherwise the converter would output a broken graph.
At the moment, our
bconv
op can read either float or int8 or bitpacked inputs, and can write either float or int8 or bitpacked outputs. When the input is not bitpacked, the first step that the op performs is calling a bitpacking procedure. Conceptually, this is almost like a 'sub-op', which doesn't need to be the responsibility of the bconv. We could instead make thebconv
op accept only bitpacked inputs, and insert dedicatedbsign
ops where needed that take float or int8 inputs and produce bitpacked ouptuts. Absign
op already exists in our code base, but we generally fuse it into a subsequentbconv
and so don't use it directly.This would have several advantages:
bconv
op implementation. This would reduce nine combinations of input/output types to just three, improve the 'separation of concerns' in our ops, and improve codebase readability and approachability.bsign
op similar to a TFL cast or quantise op. This should make things like bitpacked activation optimisations (for bothbconv
andbmaxpool
) easier, because the rewrite patterns can likely then be done in TableGen rather than C++.A prerequesite for doing this is #446, so this is blocked for now.