tensorflow / tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
Apache License 2.0
1.82k stars 799 forks source link

Quantisize LSTM Model #2670

Closed KevinKeppler closed 3 weeks ago

KevinKeppler commented 3 weeks ago

How can i Quantisize a LSTM Model to use the cmsis_nn Kernel? with Tensorflow Lite the converter only supports int8 Quantization. The LSTM micro kernel need 16bit weights and cell states. Is there a easy way to do this

mansnils commented 3 weeks ago

Are you referring to TFL or TFLM? TFLM/CMSIS-NN accepts 8bit weights: https://github.com/tensorflow/tflite-micro/blob/c01ca97f52ca451bfba714d7eb8ae0349d85f537/tensorflow/lite/micro/kernels/cmsis_nn/unidirectional_sequence_lstm.cc#L431 Here is an example: https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/examples/mnist_lstm

KevinKeppler commented 3 weeks ago

Thanks for Reply. I am referring to TFLM

yes it accepts 8 bit weights. But it needs 16bit cell state, so it can't be quantisize via tensorflow lite converter.

https://github.com/tensorflow/tflite-micro/blob/c01ca97f52ca451bfba714d7eb8ae0349d85f537/tensorflow/lite/micro/kernels/cmsis_nn/unidirectional_sequence_lstm.cc#L373

and if i want to use the function UnidirectionalSequenceLstmEvalInt8 it needs also 16bit weights. I think this eval funktion would be the fastest version.

Do you have any idea how i can convert it? Maybe to manipulate the flatbuffer?

mansnils commented 3 weeks ago

Did you check out the example? I think that should run with TFLM/CMSIS-NN.

If not you can check out the CMSIS-NN unit tests where LSTM layers are generated with Keras and quantized with the converter and all the CMSIS-NN supported LSTM functions are called based on this. https://github.com/ARM-software/CMSIS-NN/blob/9d924bdaee51ca8e0c4e86779bbb6d0c9644e555/Tests/UnitTest/RefactoredTestGen/Lib/op_lstm.py#L56

KevinKeppler commented 3 weeks ago

Hello Thank you. I have checked the example and have it analyzed with the Flattbuffers tool. It Quantisize the cell state to 16 bit and so it works. I have to check the difference to my setup. Also i have a alternative to adjuste the flattbuffers. Thanks guy

rascani commented 3 weeks ago

Please check out the requantize tool: https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/tools/requantize_flatbuffer.py