larq / compute-engine

Highly optimized inference engine for Binarized Neural Networks
https://docs.larq.dev/compute-engine
Apache License 2.0
242 stars 34 forks source link

Dorefa model size and behavior with full precision model and ste_sign model #803

Closed hamingsi closed 6 months ago

hamingsi commented 6 months ago

I'm trying to test the dorefa model with full precision and ste_sign model, to find out the difference. But I got the information I don't understand: Dorefa model size is close to full precision model rather than ste_sign model

image

Here is my LCE test on Mac m1 chip: image image image Dorefa model size inference time is faster than ste_sign(why?) and close to full precision model, which is strange. Here is my test code for Dorefa, full precision and ste_sign:

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.InputLayer((32, 32, 3), name="input"),
        # First layer (float)
        tf.keras.layers.Conv2D(32, kernel_size=(5, 5), padding="same", strides=3),
        tf.keras.layers.BatchNormalization(),
        # Note: we do NOT add a ReLU here, because the subsequent activation quantizer would destroy all information!
        # Second layer (binary)
        lq.layers.QuantConv2D(
            32,
            kernel_size=(3, 3),
            padding="same",
            strides=2,
            input_quantizer=lq.quantizers.DoReFa(k_bit=1, mode="activations"),
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Third layer (binary)
        lq.layers.QuantConv2D(
            64,
            kernel_size=(3, 3),
            padding="same",
            strides=2,
            input_quantizer=lq.quantizers.DoReFa(k_bit=1, mode="activations"),
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),

        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Pooling and final dense layer (float)
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

for ste_sign I just switch from lq.quantizers.DoReFa to "ste_sign" Here is full precision code

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.InputLayer((32, 32, 3), name="input"),
        # First layer (float)
        tf.keras.layers.Conv2D(32, kernel_size=(5, 5), padding="same", strides=3),
        tf.keras.layers.BatchNormalization(),
        # Note: we do NOT add a ReLU here, because the subsequent activation quantizer would destroy all information!
        # Second layer (binary)
        tf.keras.layers.Conv2D(
            32,
            kernel_size=(3, 3),
            padding="same",
            strides=2,

            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Third layer (binary)
        tf.keras.layers.Conv2D(
            64,
            kernel_size=(3, 3),
            padding="same",
            strides=2,

            use_bias=False  # We don't need a bias, since the BatchNorm already has a learnable offset
        ),

        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Activation("hard_tanh"),
        # Pooling and final dense layer (float)
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)
Tombana commented 6 months ago

Can you open the tflite files in netron and compare the binary layers? Perhaps the DoReFa quantizer is not picked up by the tflite converter.

hamingsi commented 6 months ago

Yeah, I tried this. It seems that Dorefa don't have binary layer. So LCE won't speed up dorefa quantizer? image

hamingsi commented 6 months ago

My major concern is whether the activation as [0,1] with weight [-1,1] computation can be speed up or not. I want to implement some activcation like LIF neuron which only emits spike in [0,1]. With binary weight, maybe it will decrease inference time and memory cost substantially.

Tombana commented 6 months ago

So LCE won't speed up dorefa quantizer?

That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer. To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1 where it is implemented without the round function but really as a boolean, similar to ste_sign.

hamingsi commented 6 months ago

So LCE won't speed up dorefa quantizer?

That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer. To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1 where it is implemented without the round function but really as a boolean, similar to ste_sign.

I did use k_bit=1 in my code, but still not work.

Tombana commented 6 months ago

I mean that the implementation of the dorefa quantizer needs a specialization for k_bit==1. See here: https://github.com/larq/larq/blob/v0.13.1/larq/quantizers.py#L680-L682

This would have to be changed to something like this:

        def _k_bit_with_identity_grad(x):
            if self.precision == 1:
                return tf.where(tf.math.less_equal(x, 0.5), tf.zeros_like(x), tf.ones_like(x)), lambda dy: dy
            else:
                n = 2**self.precision - 1
                return tf.round(x * n) / n, lambda dy: dy

Note: I did not test this, you'll have to verify that it works as expected and that the LCE converter recognizes this.

hamingsi commented 6 months ago

Thanks. I will try this. But I'm still confused why full precision model run faster than ste_sign did.

Tombana commented 6 months ago

I'm still confused why full precision model run faster than ste_sign did.

On what type of machine are you running this? LCE does not provide optimized code for the x86_64 architecture, only for 32-bit ARM and 64-bit ARM. So on x86_64, it is expected that the full precision model runs faster.

hamingsi commented 6 months ago

I'm running on Mac m1 chip. I compile the LCE with bazel.--macos_cpus=arm64. Is that correct?

Tombana commented 6 months ago

Compiling lce_benchmark_model with --macos_cpus=arm64 is correct I think.

Its possible that the M1 chip is more optimized for full-precision layers than for binary layers.

hamingsi commented 6 months ago

That's amazing. I will try different arm device. So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?

Tombana commented 6 months ago

So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?

That is correct. It's always best to check the tflite file in netron to see if the layers got converted to Lce binary layers.

hamingsi commented 6 months ago

Thanks a lot!