Closed arashb closed 4 years ago
Memory layouts: HWC // TF Lite Conv2D input --> easy to bitpack channels before im2col HWC // (default) Tensorflow Conv2D input --> easy to bitpack channels before im2col OHWI // TF Lite Conv2D weights --> easy to bitpack channels before im2col HWIO // (default) TensorFlow Conv2D weights --> not easy to bitpack channels before im2col
HWO // TF Lite DepthwiseConv2D weights (not 100% sure about this one) HWI // TensorFlow DepthwiseConv2D weights when channel_multiplier is 1
See the comment in #57. We can stick to OHWI in both TF and TF lite.
Current implementation of in compute-engine is as following:
However, it makes sense to bitpack the input matrix first along the channel dimension (you might need extra bitpadding) and do the im2col afterward (or fuzed bitpacking-im2col algorithm)