tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
736 stars 226 forks source link

Is it possible to use tensorrt to speed up original tensorflow t5 exported saved_model? #306

Open chenl-chief opened 2 years ago

chenl-chief commented 2 years ago

i've tried huggingface t5 model speed up by trt, but how can we speed up tensorflow t5 saved_model? i want to use speed-up t5 saved_model in tf-serving for production env. my envirment is:

docker image: nvcr.io/nvidia/tensorflow:22.05-tf2-py3 GPU: Tesla V100 * 2

i followed the tf-trt-user-guide, but it's not work. i first use code:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text

SAVED_MODEL_DIR = '/path/to/t5/export/saved_model'
output_saved_model_dir = '/path/to/save/trt/saved_model'

conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
    precision_mode=trt.TrtPrecisionMode.FP16)
converter = trt.TrtGraphConverterV2(
    input_saved_model_dir=SAVED_MODEL_DIR,
    conversion_params=conversion_params)

converter.convert()
converter.save(output_saved_model_dir)

it's filed when i use tf.saved_model.load, the error message is

"FAILED_PRECONDITION: Attempting to use uninitialized value"

Then i found t5 saved_model was export by tf1, the i use tf.compat.v1 to convert, code:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf

tf.compat.v1.disable_v2_behavior()

input_saved_model_dir = '/path/to/t5/export/saved_model'
output_saved_model_dir = '/path/to/save/trt/saved_model'
converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_saved_model_dir,
    max_workspace_size_bytes=(11<32),
    precision_mode='FP16',
    maximum_cached_engines=100)

converter.convert()
converter.save(output_saved_model_dir)

still faild.

ValueError: Input 0 of node decoder/block_011/layer_002/rms_norm/scale_1/parallel_0_1/Assign was passed float from decoder/block_011/layer_002/rms_norm/scale_slice_0:0 incompatible with expected float_ref.

Could someone can tell: can we use trt to convert tf-t5-saved_model ? If it's possible, how? @DEKHTIARJonathan

mihaimaruseac commented 2 years ago

I no longer work in TF

chenl-chief commented 2 years ago

I no longer work in TF

sorry, my fault.

chenl-chief commented 2 years ago

never mind, i found solution.

chenl-chief commented 2 years ago

The original saved_model tooks 300ms when batch_size=32 and sen_length=128, it's too long for deploy. So I wanted to speed up t5 by using tf-trt. But when I convert saved_model using below code, tf-trt doesn't work:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf

tf.compat.v1.disable_v2_behavior()

input_saved_model_dir = 'exported_model/batch32_length128_0810/1660123651'
output_saved_model_dir = 'trt_saved_model/batch32_length128_0810/1/'
converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_saved_model_dir,
    max_workspace_size_bytes=(11<32),
    max_batch_size=32,
    minimum_segment_size=50,
    precision_mode='FP32',
    is_dynamic_op=True,
    maximum_cached_engines=1)

converter.convert()
converter.save(output_saved_model_dir)

Before using the code, you should add some code in tensorflow/python/compiler/tensorrt/trt_convert.py. The reference is here After add code, the model could convert, but the time still no change. Could some body help me about this?