[RNN] Stateful LSTM can't be converted to TF Lite with Integer Quantization

AnastGerus commented 3 years ago

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
TensorFlow installation (pip package or built from source): pip package
TensorFlow library (version, if pip package or github SHA, if built from source): tf-nightly 2.7.0.dev20210819

2. Code

Please check my code below. Please put 'LSTMlayer.onnx' file into folder 'path' (or modify the path). This code fails with the described issue, but if you will change QUANTIZATION = 'None', it will work.

import tensorflow as tf
import os
import numpy as np

path = "\test"
QUANTIZATION = 'IntegerWithFloatFallback' # 'IntegerWithFloatFallback' or 'None'

def representative_dataset():
        dummy = np.zeros((1,1,512), dtype = np.float32)
        yield [dummy]

converter = tf.lite.TFLiteConverter.from_saved_model(path)

# we need experimental_enable_resource_variables and "select TensorFlow ops" for
# AssignVariableOp, ReadVariableOp, VarHandleOp operations, otherwise you will get an error
converter.experimental_enable_resource_variables = True
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS,
  tf.lite.OpsSet.SELECT_TF_OPS
]

if (QUANTIZATION == 'IntegerWithFloatFallback'):
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_dataset

tflite_model = converter.convert()
with open(os.path.join(path, 'model.tflite'), 'wb') as f:
  f.write(tflite_model)

3. Failure after conversion

File "...\git_issue_code.py", line 26, in tflite_model = converter.convert() File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 763, in wrapper return self._convert_and_export_metrics(convert_func, *args, kwargs) File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 749, in _convert_and_export_metrics result = convert_func(self, *args, *kwargs) File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 1031, in convert return self._optimize_tflite_model( File "...\lib\site-packages\tensorflow\lite\python\convert_phase.py", line 226, in wrapper raise error from None # Re-throws the exception. File "...\lib\site-packages\tensorflow\lite\python\convert_phase.py", line 216, in wrapper return func(args, kwargs) File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 714, in _optimize_tflite_model model = self._quantize( File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 517, in _quantize calibrate_quantize = _calibrator.Calibrator(result, File "...\lib\site-packages\tensorflow\lite\python\optimize\calibrator.py", line 78, in init raise ValueError("Failed to parse the model: %s." % e) ValueError: Failed to parse the model: Op FlexVarHandleOp missing inputs.

I have also reproduced this issue using the Keras LSTM layer directly. AssignVariableOp, ReadVariableOp, VarHandleOp are needed when you use stateful=True for LSTM layer (that is very important option for "infinite" data (e.g. audio stream)). Please contact me if you need some additional info.

Thanks, Best regards, Anastasiia

AnastGerus commented 3 years ago

LSTMlayer.txt

Please rename this file to .onnx, because I can't upload .onnx file to github.

lintian06 commented 3 years ago

Hi @AnastGerus ,

For LSTM conversion, there is an article and colab describing the process, and you can follow the colab inside that to do it. For 4 kinds of quantizations, maybe you can also try other ways like float16 or dynamic range. Hope it could help.

We cannot guarantee ONNX generated graph can be always converted to TFLite + quantization, as we don't know how initial TF graph is created by ONNX.

AnastGerus commented 3 years ago

Hi @lintian06, Thank you for your attention!

I've seen those articles and colab. In the article in the part "Known issues/limitations" actually written:

Currently there is support only for converting stateless Keras LSTM (default behavior in Keras). Stateful Keras LSTM conversion is future work.

It is still possible to model a stateful Keras LSTM layer using the underlying stateless Keras LSTM layer and managing the state explicitly in the user program. Such a TensorFlow program can still be converted to TensorFlow Lite using the feature being described here.

So I've tried to do it another way, but it doesn't work with 'Integer' quantizations. It works correctly with 'float16' and 'dynamic range' quantizations, but 2x smaller isn't enough for me, unfortunately. It would be great to get an x4 smaller model (as described in the article).

I don't have to use ONNX conversion. This issue can be easily reproduced in the collab if you change the LSTM layer to stateful = True and add Integer optimization.

Maybe you could provide some code example that is described above:

model a stateful Keras LSTM layer using the underlying stateless Keras LSTM layer and managing the state explicitly in the user program.

How exactly can I manage states in C++ code (TF Lite) when I have converted the stateless Keras LSTM layer? Wouldn't it has problems with numbers dynamic range if the calibrator (during TF -> TF Lite conversion) was calculating it for stateless LSTM, but I'll use it as stateful?

Thanks, Best regards, Anastasiia

lintian06 commented 3 years ago

For the limitation 1, it is because the latent variable is turned into constants, and cannot handle control flow correctly. Can you try stateless Keras LSTM?

If you stick to stateful Keras LSTM, for C++ code, you can extract the subgraph of TF models with only one step ("without for-loop"), and use C++ code to loop the LSTM state. However, you have to handle TFLite conversion in details.

One way is that you can define a signature_def specifying inputsand outputs when exporting a saved model, and convert to TFLite with the saved model.
The other way to do it is that you can wrap the one-step call as a Keras model, then convert with from_keras_model or create a concrete function to do it. It is basically a Kears model nesting inside another, and you only convert the internal one.

No matter which way, the TFLite model only contains one step of LSTM, and the state related to init/update/exit is handled by the code.

Hope it could help.

AnastGerus commented 3 years ago

Hi @lintian06 ,

I have tried to do something similar by "fixing" the size of the input variables with concrete_function (code based on collab):

 model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(20, stateful=True, time_major=False, return_sequences=True)
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

run_model = tf.function(lambda x: model(x))
BATCH_SIZE = 1
STEPS = 28
INPUT_SIZE = 28
concrete_func = run_model.get_concrete_function(
    tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], tf.dtypes.float32))

# model directory
MODEL_DIR = "\test"
model.save(MODEL_DIR, save_format="tf", signatures=concrete_func)

converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_DIR)

converter.experimental_enable_resource_variables = True
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]

def representative_dataset():
  for _ in range(100):
    data = np.random.rand(BATCH_SIZE, STEPS, INPUT_SIZE)
    yield [data.astype(np.float32)]

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset

tflite_model = converter.convert()

Still, it doesn't work with Integer Quantization (only Float and DynamicRange). It can work if you remove stateful = True in LSTM.

If you meant another idea please help with the code example. Thank you for your help! I'm new to LSTM conversion in TF & TFLite.

Best regards, Anastasiia

jakobwowy commented 3 years ago

@AnastGerus I have the same problem converting a model with stateful LSTM layers.

Have you seen this: https://github.com/tensorflow/tensorflow/issues/48282

I think this should be a work around. But didn't figure out how to handle the hidden states of multiple LSTM layers?

AnastGerus commented 3 years ago

Hi @jakobwowy, Thank you for this link, I'll try it.

Regarding NN with multiple LSTM layers: I had a similar problem in the onnx_tf convertor and I used a workaround by splitting my NN into a few NNs each with only one LSTM layer and made the inference sequentially. I hope it will help you!

Thanks, Anastasiia

AnastGerus commented 3 years ago

Unfortunately, this solution didn't help me because I need to work with format converted from ONNX (TensorFlowRep) - it's not Keras...

AnastGerus commented 3 years ago

Hi @lintian06, Could you please provide an example of

defining a signature_def specifying inputs and outputs when exporting a saved model

when I don't have a Keras model? Only the ONNX model or TF frozen graph are available in my case...

Thank you, Anastasiia

zc1616 commented 3 years ago

Hi @AnastGerus , I have the same problem when convert torch->onnx->tflite. Have you solved the problem?

It seems tensorflow=2.6.0 or tf-nigthtly can convert to tflite but it log error when try to network forward.

AnastGerus commented 3 years ago

Hi @lzc16 , Unfortunately, no. Issue isn't solved. In my case failure is in the process of conversion.

AnastGerus commented 3 years ago

Hi @lzc16 , I have managed to convert my model using the nightly version. But I also have an error in the runtime.

Could not find variable lstm_kernel_lstm_31. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Container localhost does not exist. (Could not find resource: localhost/lstm_kernel_lstm_31)
     (while executing 'ReadVariableOp' via Eager)Node number 26 (TfLiteFlexDelegate) failed to invoke.
Node number 42 (WHILE) failed to invoke.

Hi @lintian06 , Could you please recommend a way to solve it? Does it exist?

Thanks

zc1616 commented 3 years ago

Hi @AnastGerus ,

Thanks for your reply. I have met the same situation with you. Model can be convert to tflite with tfnightly and log similar error log with you. It seems that my problem is tflite can't support the 'FlexVarHandleOp' Op. May be try to use keras define the lstm can solve the problem, but not suit for me that torch->onnx->tflite. As you can try this method.

Best regards!

AnastGerus commented 3 years ago

Hi @lzc16 , I have the same path - 'torch -> onnx -> tflite' so I can't use keras. I just trying to find a solution or some workaround.

Best regards

zc1616 commented 3 years ago

Hi @AnastGerus , Is there any suggestion that modify tensorflow.pb model manual? Or redefine the net in the tensorflow(using keras) and load the onnx model?

Thanks

zc1616 commented 3 years ago

Hi @AnastGerus

I got the explain from https://github.com/tensorflow/tensorflow/issues/52041.

AnastGerus commented 3 years ago

Hi @lzc16, I can see from that issue that the problem was in onnx-tf conversion. In my case, the TF model works ok, and the TFLite model with float16 or Dynamic Range quantizations also works fine. So the problem is specific to Integer quantization.

Could you please recommend the way to modify tensorflow.pb model manually?

Thanks

zc1616 commented 3 years ago

Hi @AnastGerus ,

I try set the tensorflow.pb weights manually follow this author https://github.com/onnx/onnx-tensorflow/issues/971#issuecomment-926467939.

By the way, do you try another way in onnx-tf which causes TF models work? I want to figure out where I made the mistake.

AnastGerus commented 3 years ago

Hi @lzc16 , I don't really get what other way are you talking about... Please specify.

zc1616 commented 3 years ago

Hi @AnastGerus ,

I'm sorry I didn't describe the problem clearly. I can see you can make it work when set the QUANTIZATION = 'None'.

However, in my case, When I convert onnx to tflite without quantization it had already log error.

Therefore, I'm confused how do you convert torch to onnx and I wondered if I overlooked the key steps.

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10

TensorFlow installation (pip package or built from source): pip package

TensorFlow library (version, if pip package or github SHA, if built from source): tf-nightly 2.7.0.dev20210819

2. Code

Please check my code below. Please put 'LSTMlayer.onnx' file into folder 'path' (or modify the path). This code fails with the described issue, but if you will change QUANTIZATION = 'None', it will work.
import tensorflow as tf
import os
import numpy as np

path = "\test"
QUANTIZATION = 'IntegerWithFloatFallback' # 'IntegerWithFloatFallback' or 'None'

def representative_dataset():
        dummy = np.zeros((1,1,512), dtype = np.float32)
        yield [dummy]

converter = tf.lite.TFLiteConverter.from_saved_model(path)

# we need experimental_enable_resource_variables and "select TensorFlow ops" for
# AssignVariableOp, ReadVariableOp, VarHandleOp operations, otherwise you will get an error
converter.experimental_enable_resource_variables = True
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS,
  tf.lite.OpsSet.SELECT_TF_OPS
]

if (QUANTIZATION == 'IntegerWithFloatFallback'):
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_dataset

tflite_model = converter.convert()
with open(os.path.join(path, 'model.tflite'), 'wb') as f:
  f.write(tflite_model)
3. Failure after conversion

File "...\git_issue_code.py", line 26, in tflite_model = converter.convert() File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 763, in wrapper return self._convert_and_export_metrics(convert_func, *args, kwargs) File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 749, in _convert_and_export_metrics result = convert_func(self, *args, *kwargs) File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 1031, in convert return self._optimize_tflite_model( File "...\lib\site-packages\tensorflow\lite\python\convert_phase.py", line 226, in wrapper raise error from None # Re-throws the exception. File "...\lib\site-packages\tensorflow\lite\python\convert_phase.py", line 216, in wrapper return func(args, kwargs) File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 714, in _optimize_tflite_model model = self._quantize( File "...\lib\site-packages\tensorflow\lite\python\lite.py", line 517, in _quantize calibrate_quantize = _calibrator.Calibrator(result, File "...\lib\site-packages\tensorflow\lite\python\optimize\calibrator.py", line 78, in init raise ValueError("Failed to parse the model: %s." % e) ValueError: Failed to parse the model: Op FlexVarHandleOp missing inputs.

I have also reproduced this issue using the Keras LSTM layer directly. AssignVariableOp, ReadVariableOp, VarHandleOp are needed when you use stateful=True for LSTM layer (that is very important option for "infinite" data (e.g. audio stream)). Please contact me if you need some additional info.

Thanks, Best regards, Anastasiia

AnastGerus commented 3 years ago

Hi @lzc16, I didn't convert torch to onnx, I've used export function.

tensorflow / model-optimization