tensorflow / tensorrt

TensorFlow/TensorRT integration
Apache License 2.0
736 stars 226 forks source link

Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created while converting a saved model to trt engine #336

Open devvaibhav455 opened 8 months ago

devvaibhav455 commented 8 months ago

I am trying to convert a tensorflow saved_model to tensorrt engine using the below python script.

from tensorflow.python.compiler.tensorrt import trt_convert as trt

# Conversion Parameters 
conversion_params = trt.TrtConversionParams(precision_mode=trt.TrtPrecisionMode.FP16)

input_saved_model_dir = "/home/administrator/Documents/penguin_behavior_detection/tf_onnx_trt_stuff/seagate_exported_model/saved_model"
output_saved_model_dir = "/home/administrator/Documents/penguin_behavior_detection/tf_onnx_trt_stuff/"

converter = trt.TrtGraphConverterV2(input_saved_model_dir=input_saved_model_dir, conversion_params=conversion_params)

# Converter method used to partition and optimize TensorRT compatible segments
converter.convert()

converter.summary()

# Save the model to the disk 
converter.save(output_saved_model_dir)

This is the structure of seagate_exported_model directory

.
├── checkpoint
│   ├── checkpoint
│   ├── ckpt-0.data-00000-of-00001
│   └── ckpt-0.index
├── pipeline.config
└── saved_model
    ├── assets
    ├── fingerprint.pb
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

I get below output on the terminal

2024-04-01 20:36:22.993604: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:From /home/administrator/Documents/penguin_behavior_detection/tf_onnx_trt_stuff/convert_to_trt.py:23: calling TrtGraphConverterV2.__init__ (from tensorflow.python.compiler.tensorrt.trt_convert) with conversion_params is deprecated and will be removed in a future version.
Instructions for updating:
Use individual converter parameters instead
2024-04-01 20:36:25.663756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9528 MB memory:  -> device: 0, name: NVIDIA TITAN V, pci bus id: 0000:c1:00.0, compute capability: 7.0
2024-04-01 20:36:25.664265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10431 MB memory:  -> device: 1, name: NVIDIA TITAN V, pci bus id: 0000:e1:00.0, compute capability: 7.0
2024-04-01 20:36:42.381116: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-04-01 20:36:42.381230: I tensorflow/core/grappler/clusters/single_machine.cc:361] Starting new session
2024-04-01 20:36:42.382733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9528 MB memory:  -> device: 0, name: NVIDIA TITAN V, pci bus id: 0000:c1:00.0, compute capability: 7.0
2024-04-01 20:36:42.382881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10431 MB memory:  -> device: 1, name: NVIDIA TITAN V, pci bus id: 0000:e1:00.0, compute capability: 7.0
2024-04-01 20:36:47.126394: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-04-01 20:36:47.126499: I tensorflow/core/grappler/clusters/single_machine.cc:361] Starting new session
2024-04-01 20:36:47.127907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9528 MB memory:  -> device: 0, name: NVIDIA TITAN V, pci bus id: 0000:c1:00.0, compute capability: 7.0
2024-04-01 20:36:47.128046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10431 MB memory:  -> device: 1, name: NVIDIA TITAN V, pci bus id: 0000:e1:00.0, compute capability: 7.0
2024-04-01 20:36:47.655835: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:186] Calibration with FP32 or FP16 is not implemented. Falling back to use_calibration = False.Note that the default value of use_calibration is True.
2024-04-01 20:36:47.730476: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:970] 

################################################################################
TensorRT unsupported/non-converted OP Report:
        - GatherV2 -> 46x
        - StridedSlice -> 35x
        - Sub -> 30x
        - Shape -> 24x
        - Cast -> 22x
        - ConcatV2 -> 19x
        - Mul -> 19x
        - ExpandDims -> 18x
        - Pack -> 17x
        - Identity -> 17x
        - Select -> 16x
        - Fill -> 15x
        - Reshape -> 15x
        - Placeholder -> 14x
        - Less -> 10x
        - Unpack -> 10x
        - Greater -> 9x
        - AddV2 -> 8x
        - Pad -> 8x
        - Switch -> 8x
        - NonMaxSuppressionV5 -> 7x
        - Minimum -> 7x
        - Merge -> 7x
        - NextIteration -> 6x
        - Enter -> 6x
        - Split -> 5x
        - Slice -> 5x
        - RealDiv -> 4x
        - Maximum -> 4x
        - Round -> 4x
        - Transpose -> 3x
        - Range -> 3x
        - NoOp -> 3x
        - Reciprocal -> 2x
        - Squeeze -> 2x
        - ResizeBilinear -> 2x
        - Exit -> 2x
        - Exp -> 2x
        - TopKV2 -> 2x
        - Tile -> 2x
        - TensorListStack -> 2x
        - TensorListReserve -> 2x
        - TensorListSetItem -> 2x
        - Where -> 1x
        - TensorListGetItem -> 1x
        - TensorListFromTensor -> 1x
        - GreaterEqual -> 1x
        - Sum -> 1x
        - LogicalAnd -> 1x
        - LoopCond -> 1x
--------------------------------------------------------------------------------
        - Total nonconverted OPs: 451
        - Total nonconverted OP Types: 50
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################

2024-04-01 20:36:48.146815: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1298] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 10 TRT Engines with  at least minimum_segment_size=3 nodes.
2024-04-01 20:36:48.182719: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 10
2024-04-01 20:36:48.224087: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 0 consisting of 6 nodes by TRTEngineOp_000_000.
2024-04-01 20:36:48.224163: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 1 consisting of 1993 nodes by TRTEngineOp_000_001.
2024-04-01 20:36:48.227138: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:916] TF-TRT Warning: Cannot replace segment 2 consisting of 6 nodes by TRTEngineOp_000_002 reason: Segment has no inputs (possible constfold failure) (keeping original segment).
2024-04-01 20:36:48.227395: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 3 consisting of 5 nodes by TRTEngineOp_000_003.
2024-04-01 20:36:48.227442: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 4 consisting of 4 nodes by TRTEngineOp_000_004.
2024-04-01 20:36:48.227482: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 5 consisting of 4 nodes by TRTEngineOp_000_005.
2024-04-01 20:36:48.227535: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 6 consisting of 25 nodes by TRTEngineOp_000_006.
2024-04-01 20:36:48.227592: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 7 consisting of 4 nodes by TRTEngineOp_000_007.
2024-04-01 20:36:48.227635: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 8 consisting of 4 nodes by TRTEngineOp_000_008.
2024-04-01 20:36:48.227671: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 9 consisting of 3 nodes by TRTEngineOp_000_009.
TRTEngineOP Name                 Device        # Nodes # Inputs      # Outputs     Input DTypes       Output Dtypes      Input Shapes       Output Shapes     
================================================================================================================================================================

----------------------------------------

TRTEngineOp_000_000              device:GPU:0  6       1             1             ['float32']        ['float32']        [[1, -1, -1, 3]]   [[1, -1, -1, 3]]  

        - Const: 3x
        - Mul: 2x
        - Sub: 1x

----------------------------------------

TRTEngineOp_000_001              device:GPU:0  1931    1             2             ['float32']        ['float32', 'f ... [[1, 640, 640, 3]] [[1, 76725, 4] ...

        - AddV2: 15x
        - BatchMatMulV2: 32x
        - BiasAdd: 122x
        - ConcatV2: 2x
        - Const: 864x
        - Conv2D: 165x
        - DepthwiseConv2dNative: 94x
        - FusedBatchNormV3: 133x
        - MaxPool: 18x
        - Mean: 22x
        - Mul: 149x
        - Pack: 64x
        - Reshape: 69x
        - Sigmoid: 150x
        - Squeeze: 32x

----------------------------------------

TRTEngineOp_000_003              device:GPU:0  7       4             1             ['float32', 'f ... ['float32']        [[57600, 1], [ ... [[57600, 4]]      

        - ConcatV2: 1x
        - Const: 2x
        - Mul: 4x

----------------------------------------

TRTEngineOp_000_004              device:GPU:0  4       4             1             ['float32', 'f ... ['float32']        [[-1, 1], [-1, ... [[-1]]            

        - Mul: 1x
        - Squeeze: 1x
        - Sub: 2x

----------------------------------------

TRTEngineOp_000_005              device:GPU:0  4       4             1             ['float32', 'f ... ['float32']        [[-1, 1], [-1, ... [[-1]]            

        - Mul: 1x
        - Squeeze: 1x
        - Sub: 2x

----------------------------------------

TRTEngineOp_000_006              device:GPU:0  25      1             8             ['float32']        ['float32', 'f ... [[76725, 7]]       [[76725, 7], [ ...

        - Const: 10x
        - Reshape: 8x
        - Slice: 7x

----------------------------------------

TRTEngineOp_000_007              device:GPU:0  6       2             1             ['float32', 'f ... ['float32']        [[57600, 2], [ ... [[57600, 4]]      

        - AddV2: 1x
        - ConcatV2: 1x
        - Const: 2x
        - Mul: 1x
        - Sub: 1x

----------------------------------------

TRTEngineOp_000_008              device:GPU:0  3       1             1             ['float32']        ['float32']        [[76725, 1, 4]]    [[76725, 4]]      

        - Const: 1x
        - Reshape: 1x
        - Unpack: 1x

----------------------------------------

TRTEngineOp_000_009              device:GPU:0  3       1             2             ['float32']        ['float32', 'f ... [[1, 76725, 4]]    [[1, 76725, 1, ...

        - Const: 1x
        - ExpandDims: 1x
        - Squeeze: 1x

================================================================================================================================================================
[*] Total number of TensorRT engines: 9
[*] % of OPs Converted: 78.87% [1989/2522]

2024-04-01 20:36:49.217361: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.217649: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.217857: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.218032: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.218200: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.218390: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.218555: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.218810: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created
2024-04-01 20:36:49.218981: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: NOT_FOUND: TRTEngineCacheResource not yet created

Below is my environment

Python: 3.10.13
Tensorflow: 2.16.1
OS: Ubuntu 20.04
TensorRT: 8.6.1
Cuda: 12.1
nVidia driver: 530.30.02

Any help is highly appreciated. Thanks in advance!