Open suyash-narain opened 7 months ago
I don't have a VM which matches this architecture with GPU closely enough,
Hi @impjdi, can you please take a look? Thanks.
Hi @pkgoogle @impjdi,
I get the same error when i use any whisper based tflite models. On digging a bit deeper I found out that the delegate is giving runtime errors because the model contains an op which has dynamic sized tensors whereas the delegate can support only static sized tensors. My question now is, why are these ops not falling back onto CPU instead and giving a runtime error on GPU? Is there a way i can convert dynamic tensors to static while converting the model?
I use the below script to generate my whisper tflite model
import tensorflow as tf
import transformers
from datasets import load_dataset
from transformers import WhisperProcessor, WhisperFeatureExtractor, TFWhisperForConditionalGeneration, WhisperTokenizer
target = "openai/whisper-tiny.en"
feature_extractor = WhisperFeatureExtractor.from_pretrained(target)
tokenizer = WhisperTokenizer.from_pretrained(target, predict_timestamps=True)
processor = WhisperProcessor(feature_extractor, tokenizer)
model = TFWhisperForConditionalGeneration.from_pretrained(target)
# Loading dataset
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
inputs = feature_extractor(
ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="tf"
)
input_features = inputs.input_features
# Generating Transcription
generated_ids = model.generate(input_features=input_features)
print(generated_ids)
transcription = processor.tokenizer.decode(generated_ids[0])
print(transcription)
# Save the model
model.save('./content/tf_whisper_saved')
class GenerateModel(tf.Module):
def __init__(self, model):
super(GenerateModel, self).__init__()
self.model = model
@tf.function(
input_signature=[
tf.TensorSpec((1, 80, 3000), tf.float32, name="input_features"),
],
)
def serving(self, input_features):
outputs = self.model.generate(
input_features,
max_new_tokens=100,
return_dict_in_generate=True,
)
return {"sequences": outputs["sequences"]}
saved_model_dir = './content/tf_whisper_saved'
tflite_model_path = 'whisper_tiny.tflite'
generate_model = GenerateModel(model=model)
tf.saved_model.save(generate_model, saved_model_dir, signatures={"serving_default": generate_model.serving})
# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8,
tf.lite.OpsSet.SELECT_TF_OPS] # enable TensorFlow Lite ops.
# enable TensorFlow ops.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Float16 quantization reduces the size to 50%
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()
# Save the model
with open(tflite_model_path, 'wb') as f:.
f.write(tflite_model)
the generated model has an OP named 'WHILE' which is INT32, and is the second last op, having multiple inputs. How can i give it static inputs instead or ensure this op fallsback onto CPU instead of the delegate?
thanks
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
tf 2.14
Custom code
Yes
OS platform and distribution
aarch64 linux
Mobile device
No response
Python version
python 3.10.9
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I am using an aarch64 device similar to raspberry pi running tf 2.14. I installed the latest version of tflite_runtime using pip3 install tflite_runtime which installed v2.14
I have a tflite model sourced from here: https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper.tflite which works well on CPU but when I try to execute it on GPU or NNAPI tflite delegate, I get runtime error and no other error log accompanying it.
The error snippet is below:
the code I am using is similar to the one mentioned in this comment: https://github.com/tensorflow/tensorflow/issues/59273#issuecomment-1397704596
I checked the model support using model Analyzer
and i get the output:
the entire log is attached: model_analyzer_log.txt
Not all ops in this model are supported in GPU but other ops are supported. My understanding is that model ops which are not supported on the delegate should fallback onto CPU. But instead of falling back, I end up getting RUNTIME ERROR. Why are unsupported ops not falling back onto CPU instead?
Are unsupported ops not falling back onto CPU by default in TFLite?
Standalone code to reproduce the issue
Relevant log output
No response