tflite operators support dependent on tf.shape

DocDriven commented 4 years ago

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): docker images
TensorFlow version (use command below): 1.15.0-rc2 / 2.10.-dev20191027
Python version: 3.6
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

Describe the current behavior I am trying to convert an variational autoencoder into a tflite model. During the process, I stumbled across a weird bug. When explicitly specifying the shape of the sampling layer like this: tf.random.normal(shape=(10,)), the model is convertible without any errors.

But in case of not hard-coding the shape into my model, e.g. by inferencing it from the input vector(s) like so:

dimension = tf.shape(z_mu)[1] #index 0 is batch size
eps = tf.random.normal(shape=(dim,))

I get the error that tf.random.normal is not supported by the TF Lite runtime:

Some of the operators in the model are not supported by the standard TensorFlow Lite runtime. If those are native TensorFlow operators, you might be able to use the extended runtime by passing --enable_select_tf_ops, or by setting target_ops=TFLITE_BUILTINS,SELECT_TF_OPS when calling tf.lite.TFLiteConverter(). Otherwise, if you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: ADD, EXP, FULLY_CONNECTED, LEAKY_RELU, LOG, MUL. Here is a list of operators for which you will need custom implementations: RandomStandardNormal.

Describe the expected behavior

It doesn't make sense to me that the possibly buggy inference of a shape influences whether an OP is supported or not. Also, I am not sure if the hard-coded model can be trusted or not.

Code to reproduce the issue Just set the EXPLICIT flag to False to get the error.

import tensorflow as tf
import pandas as pd
import numpy as np
from tensorflow import keras

print(tf.__version__)

EXPLICIT = True

### DATA
training_data = np.random.rand(1000, 90)
train_dataset = tf.data.Dataset.from_tensor_slices((training_data, training_data))
train_dataset = train_dataset.shuffle(1000).batch(100)

### MODEL
x = keras.layers.Input(shape=(90,))
h = keras.layers.Dense(40, activation=tf.nn.relu)(x)
z_mu = keras.layers.Dense(10)(h)
z_sigma = keras.layers.Dense(10, activation=tf.nn.sigmoid)(h)

#########################################################################################
if EXPLICIT:
    eps = tf.random.normal(shape=(10,)) # this works

else:
    batch_size = tf.shape(z_mu)[0]
    dimension = tf.shape(z_mu)[1]
    eps = tf.random.normal(shape=(batch_size, dimension)) # this does NOT work, also tried using shape=(dimension,)

#########################################################################################
z = z_mu + eps * z_sigma

h_decoded = keras.layers.Dense(40, activation=tf.nn.relu)(z)
x_decoded = keras.layers.Dense(90)(h_decoded)

model = keras.models.Model(x, x_decoded)

### LOSS
recon_err = tf.reduce_sum(tf.abs(x - x_decoded), axis=1)
kl_div = -.5 * tf.reduce_sum(1 + 2 * tf.math.log(z_sigma) - tf.square(z_mu) - tf.square(z_sigma), axis=1)
total_loss = tf.reduce_mean(recon_err + kl_div)
model.add_loss(total_loss)

### TRAINING
model.compile(optimizer='adam')
print(model.summary())
model.fit(train_dataset, epochs=5)

### SAVE
keras_file = 'vae_test.h5'
keras.models.save_model(model, keras_file)

### CONVERSION
test_dataset = np.random.rand(100, 90).astype(np.float32)

def representative_dataset_gen():
    for i in range(100):
        yield [test_dataset[i:i+1]]

converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(keras_file)
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_file_name = 'vae.tflite'
tflite_model = converter.convert()
open(tflite_file_name, 'wb').write(tflite_model)

Can you confirm the strange behavior and whether tf.random.normal is implemented or not?

DocDriven commented 4 years ago

After further testing, I think that the tflite model is not working as intended. Although it can be loaded and seems to produce somewhat plausible results, it seems to be a completely deterministic model. If it was working correctly, this would not be the case.

This leads me to believe that a non-implemented error for the random function goes unnoticed by the converter. Maybe this proves helpful for further investigations.

ymodak commented 4 years ago

Perhaps using SELECT_TF_OPS can help in this case.

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]

See gist for your reference. Thanks!

google-ml-butler[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 4 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 4 years ago

Are you satisfied with the resolution of your issue? Yes No

tensorflow / tensorflow

tflite operators support dependent on tf.shape #33791