Int8 Inference Speed Drasticly Dropped

tensorflow / tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).

Apache License 2.0

1.82k stars 799 forks source link

Int8 Inference Speed Drasticly Dropped #2631

Closed zyw02 closed 2 weeks ago

zyw02 commented 1 month ago

I quantized a cnn to an int8 one and measured the average inference speed, the latency is 2x slower than the original fp32 model. Then I used the profile tools and printed out latency per layer. It turns out that ops like conv2d took much more ticks in the int8 context, But shouldn't the int8 model be the faster one? Can anyone help me with this problem?

ArmRyan commented 1 month ago

Hey @zyw02 , could you provide more details? eg. Model, hardware, are you using reference kernels or optimized kernels, tflm version, any logs or profiling that you did

github-actions[bot] commented 2 weeks ago

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

github-actions[bot] commented 2 weeks ago

"This issue is being closed because it has been marked as stale for 5 days with no further activity."