tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.48k stars 320 forks source link

float16 quantization runs out of memory for LSTM model #1091

Open Black3rror opened 10 months ago

Black3rror commented 10 months ago

No matter the size of the LSTM model, converting it with float16 optimization runs out of memory.

Code to reproduce the issue The code snippet to reproduce the issue on Google Colab Code:

import numpy as np
import tensorflow as tf
import tensorflow_model_optimization as tfmot

def create_model():
  model = tf.keras.models.Sequential()

  # For the model to later get converted, batch_size and sequence_length should be fixed.
  # E.g., batch_input_shape=[None, 1] will throw an error.
  # This is just a limitation when using RNNs. E.g., for FC or CNN we can have batch_size=None
  model.add(tf.keras.layers.Embedding(
    input_dim=5,
    output_dim=1,
    batch_input_shape=[1, 1]
  ))

  model.add(tf.keras.layers.LSTM(
    units=1,
    return_sequences=False,
    stateful=False
  ))

  model.add(tf.keras.layers.Dense(5))

  return model

model = create_model()
model.summary()

model.save("/content/model/")

representative_data = np.random.randint(0, 5, (200, 1)).astype(np.float32)

def representative_dataset():
  for sample in representative_data:
    sample = np.expand_dims(sample, axis=0)     # batch_size = 1
    yield [sample]                              # set sample as first (and only) input of the model

# float16 quantization
converter = tf.lite.TFLiteConverter.from_saved_model("/content/model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
# kernel runs out of memory and crashes in the following line
tflite_quant_model = converter.convert()
cdh4696 commented 10 months ago

@yyoon Could you please check? Thanks!

malloyca commented 9 months ago

I have also encountered this problem using TensorFlow version 12.2.1 on my system. Non-optimized conversion works fine with LSTM, but float16 optimization is causing my kernel to crash repeatedly.

barrypitman commented 2 months ago

Same problem here.