larq / compute-engine

Highly optimized inference engine for Binarized Neural Networks
https://docs.larq.dev/compute-engine
Apache License 2.0
240 stars 33 forks source link

extra model size induced by non-parameter layer #773

Open bywmm opened 1 year ago

bywmm commented 1 year ago

Hi there, I'm now working to benchmark my BNN with LCE. Following the instruction of larq and lce, I first implement my BNN with Keras and convert it to .tflite by LCE. However, the actual size of my .tflite is much bigger than the theoretical expected one, reported by lq.models.summary(). I find that the extra size (about 60K) is introduced by a non-parameter custom keras layer in my model, which is show below. (This custom layer is used as a substitute for Sparse-Dense Matrix Multiplication.)

class MyLayer(Layer):
    def __init__(self,  **kwargs):
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.build = True

    def call(self, inputs, mask=None):
        data= inputs[0]  # A dense tensor with size of [N, D]
        idx = inputs[1]  # A dense tensor with size of [N, M]
        weight = inputs[2] # A dense tensor with size of [N, M]

        idx_sparse = tf.sparse.from_dense(tf.cast(idx, dtype=tf.int32))
        weight_sparse = tf.sparse.from_dense(weight)
        output = tf.nn.embedding_lookup_sparse(features, idx_sparse, weight_sparse)

        return output

    def get_config(self):
        config = {}

        base_config = super(MyLayer, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

This non-parameter custom layer introduces ~60K .tflite size, which is unaffordable in my case, because the rest of my BNN module only introduces 16K .tflite size (slightly bigger than the theoretical model size). So I want to lower its size.

It seems that the extra size is introduced by the sparse tensor. I make a simple test to prove it. Using the following call() function to replace the original one in MyLayer, the overall size of .tflite is 17K, which means this MyLayer only add 1K to .tflite. (This MyLayer only contains two matmul operators.)

def call(self, inputs, mask=None):
        data= inputs[0]  # A dense tensor with size of [N, D]
        idx = inputs[1]  # A dense tensor with size of [N, M]

        output = K.dot(K.dot(idx, tf.transpose(idx)), features)

        return output

Then, when the sparse tensor is involved, like in the following case, I just convert a dense tensor to a sparse tensor and then convert it back to the dense one. The overall size of .tflite becomes 82K!

def call(self, inputs, mask=None):
        data= inputs[0]  # A dense tensor with size of [N, D]
        idx = inputs[1]  # A dense tensor with size of [N, M]

        idx_sparse= tf.sparse.from_dense(idx)
        idx = tf.sparse.to_dense(idx_sparse)

        output = K.dot(K.dot(idx, tf.transpose(idx)), features)

        return output

So, I'm wondering why the sparse tensor introduces so much extra size of .tflite, and how I lower it and implement the Sparse-Dense Matrix Multiplication operator. Can you give me some hint?

Thank you.

Tombana commented 1 year ago

I don't know if TFLite supports this sparse-dense matrix multiplication. I think you should open this issue in the tensorflow repository instead.