Bit-packing of input tensor along channel dimension

larq / compute-engine

Highly optimized inference engine for Binarized Neural Networks

https://docs.larq.dev/compute-engine

Apache License 2.0

243 stars 35 forks source link

Bit-packing of input tensor along channel dimension #52

Closed arashb closed 4 years ago

arashb commented 5 years ago

Current implementation of in compute-engine is as following:

im2col
bitpack the im2col matrix and filter matrix
BGEMM

However, it makes sense to bitpack the input matrix first along the channel dimension (you might need extra bitpadding) and do the im2col afterward (or fuzed bitpacking-im2col algorithm)

Tombana commented 5 years ago

Memory layouts: HWC // TF Lite Conv2D input --> easy to bitpack channels before im2col HWC // (default) Tensorflow Conv2D input --> easy to bitpack channels before im2col OHWI // TF Lite Conv2D weights --> easy to bitpack channels before im2col HWIO // (default) TensorFlow Conv2D weights --> not easy to bitpack channels before im2col

HWO // TF Lite DepthwiseConv2D weights (not 100% sure about this one) HWI // TensorFlow DepthwiseConv2D weights when channel_multiplier is 1

Tombana commented 5 years ago

See the comment in #57. We can stick to OHWI in both TF and TF lite.