Open ericxsun opened 3 years ago
We tested it following the blog Leveraging TensorFlow-TensorRT integration for Low latency Inference, and got a very large saved model
ENV
size before and after convert
convert a model finetuned with bert
from tensorflow.python.compiler.tensorrt import trt_convert as trt conversion_params = trt.TrtConversionParams(precision_mode=trt.TrtPrecisionMode.FP32) input_saved_model_dir = 'xxx' output_saved_model_dir = 'xxx' converter = trt.TrtGraphConverterV2(input_saved_model_dir=input_saved_model_dir, conversion_params=conversion_params) converter.convert() converter.save(output_saved_model_dir)
4.0K ./bert_finetune_20210303/assets 9.5M ./bert_finetune_20210303/saved_model.pb 387M ./bert_finetune_20210303/variables 397M ./bert_finetune_20210303 4.0K ./bert_finetune_20210303_fp16/assets 1.1G ./bert_finetune_20210303/saved_model.pb 387M ./bert_finetune_20210303_fp16/variables 1.5G ./bert_finetune_20210303_fp16 4.0K ./bert_finetune_20210303_fp32/assets 1.1G ./bert_finetune_20210303_fp32/saved_model.pb 387M ./bert_finetune_20210303_fp32/variables 1.5G ./bert_finetune_20210303_fp32
Anyone could help me. Thanks a lot
We tested it following the blog Leveraging TensorFlow-TensorRT integration for Low latency Inference, and got a very large saved model
ENV
size before and after convert
convert a model finetuned with bert
Anyone could help me. Thanks a lot