Closed hamingsi closed 6 months ago
Can you open the tflite files in netron and compare the binary layers? Perhaps the DoReFa quantizer is not picked up by the tflite converter.
Yeah, I tried this. It seems that Dorefa don't have binary layer. So LCE won't speed up dorefa quantizer?
My major concern is whether the activation as [0,1] with weight [-1,1] computation can be speed up or not. I want to implement some activcation like LIF neuron which only emits spike in [0,1]. With binary weight, maybe it will decrease inference time and memory cost substantially.
So LCE won't speed up dorefa quantizer?
That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer.
To get LCE to recognize it as a binary quantizer, you might have to add a specialization for k_bit==1
where it is implemented without the round
function but really as a boolean, similar to ste_sign
.
So LCE won't speed up dorefa quantizer?
That is correct. In general the DoReFa quantizer can output more than 1 bit, so then it is not a binary layer. To get LCE to recognize it as a binary quantizer, you might have to add a specialization for
k_bit==1
where it is implemented without theround
function but really as a boolean, similar toste_sign
.
I did use k_bit=1 in my code, but still not work.
I mean that the implementation of the dorefa quantizer needs a specialization for k_bit==1
.
See here:
https://github.com/larq/larq/blob/v0.13.1/larq/quantizers.py#L680-L682
This would have to be changed to something like this:
def _k_bit_with_identity_grad(x):
if self.precision == 1:
return tf.where(tf.math.less_equal(x, 0.5), tf.zeros_like(x), tf.ones_like(x)), lambda dy: dy
else:
n = 2**self.precision - 1
return tf.round(x * n) / n, lambda dy: dy
Note: I did not test this, you'll have to verify that it works as expected and that the LCE converter recognizes this.
Thanks. I will try this. But I'm still confused why full precision model run faster than ste_sign did.
I'm still confused why full precision model run faster than ste_sign did.
On what type of machine are you running this? LCE does not provide optimized code for the x86_64 architecture, only for 32-bit ARM and 64-bit ARM. So on x86_64, it is expected that the full precision model runs faster.
I'm running on Mac m1 chip. I compile the LCE with bazel.--macos_cpus=arm64. Is that correct?
Compiling lce_benchmark_model
with --macos_cpus=arm64
is correct I think.
Its possible that the M1 chip is more optimized for full-precision layers than for binary layers.
That's amazing. I will try different arm device. So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?
So LCE do support binary convoluation(activation in [0,1] weight in [-1,1]). Is that correct?
That is correct. It's always best to check the tflite file in netron to see if the layers got converted to Lce
binary layers.
Thanks a lot!
I'm trying to test the dorefa model with full precision and ste_sign model, to find out the difference. But I got the information I don't understand: Dorefa model size is close to full precision model rather than ste_sign model
Here is my LCE test on Mac m1 chip: Dorefa model size inference time is faster than ste_sign(why?) and close to full precision model, which is strange. Here is my test code for Dorefa, full precision and ste_sign:
for ste_sign I just switch from lq.quantizers.DoReFa to "ste_sign" Here is full precision code