Separate our 'bsign' and 'bconv' ops

At the moment, our bconv op can read either float or int8 or bitpacked inputs, and can write either float or int8 or bitpacked outputs. When the input is not bitpacked, the first step that the op performs is calling a bitpacking procedure. Conceptually, this is almost like a 'sub-op', which doesn't need to be the responsibility of the bconv. We could instead make the bconv op accept only bitpacked inputs, and insert dedicated bsign ops where needed that take float or int8 inputs and produce bitpacked ouptuts. A bsign op already exists in our code base, but we generally fuse it into a subsequent bconv and so don't use it directly.

This would have several advantages:

Simplifying the bconv op implementation. This would reduce nine combinations of input/output types to just three, improve the 'separation of concerns' in our ops, and improve codebase readability and approachability.
Reducing binary size. This is an extension of the above point - there would be significantly less code templating.
Make certain optimisations in the converter easier and cleaner to implement. We would be able to treat the bsign op similar to a TFL cast or quantise op. This should make things like bitpacked activation optimisations (for both bconv and bmaxpool) easier, because the rewrite patterns can likely then be done in TableGen rather than C++.

A prerequesite for doing this is #446, so this is blocked for now.

I roughly implemented the required changes in the converter which was rather painless. Though for the conversion patterns when reading and writing bitpacked output of the bconv it might make sense to discuss some of the implementation details beforehand. We briefly discussed the three possible implementations before, but I think it might be useful to have them explicitely written down here for reference:

Option 1: "Dequantize/Quantize"

The main idea here is that we introduce an quantize and dequantize op to replace the current bsign (alternative names would be bsign/dequantize, binarize/dequantize, pack_bits/unpack_bits or pack/unpack, ...). The signature of the ops would be {float32, int8} --> {int32} and {int32} -> {float32, int8} for quantize and dequantize respectively. This is very close to how TFLite and MLIR handle int8 quantization and would be how the converter will handle it internally in any case, regardless of of which possibility we pick for the kernels. While this would introduce a new op in LCE, the only other downside I see is that float32 -> float32 binarization would be potentially slower as it would become float32 -> int32 --> float32. In general I don't think this is too problematic as float32 -> float32 binarization is not really useful in practice anyway.

Option 2: "All in one Bsign"

This option would combine the previous approach in a single bsign op with the signature {float32, int8, int32} --> {float32, int8, int32}. While this makes the kernel implementation a bit more complicated it would remove the need to add a new op.

Option 3: "Current Bsign + bitpacked output"

This is the option wich is closest to the current apprach where we would add the possibility to output a bitpacked tensor, so that the signature of the bsign op would become {float32, int8} --> {float32, int8, int32}. With this option we need to be a bit careful in the converter and need to make sure to only convert tfl.maxpool_2d --> lq.bsign to lq.bsign -> lq.bmaxpool_2d in cases where it is followed by a bconv op that supports bitpacked int32 input since otherwise the converter would output a broken graph.

larq / compute-engine