Closed lebionick closed 1 year ago
Can your print some message about name and shape
after this line shape = engine.get_tensor_shape(name)
?
The model include tree inputs, perhaps it should declare all these shaps explicitly in --shapes
LD_LIBRARY_PATH=TensorRT-8.6.1.6/lib/ TensorRT-8.6.1.6/bin/trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx --workspace=4096 --shapes=image:1x3x640x640 --saveEngine=rtdetr_r50vd_6x_coco.trt --avgRuns=10 --fp16
name='im_shape' shape=(1, 2)
name='image' shape=(1, 3, 640, 640)
name='scale_factor' shape=(1, 2)
name='tile_3.tmp_0' shape=()
It crashes at tile_3.tmp_0, which seems to be last layer(?)
I tried passing shapes like this:
LD_LIBRARY_PATH=TensorRT-8.6.1.6/lib/ TensorRT-8.6.1.6/bin/trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx --workspace=4096 --shapes="image:1x3x640x640,scale_factor:1x2,im_shape:1x2" --saveEngine=rtdetr_r50vd_6x_coco.trt --avgRuns=10 --fp16
And got no warnings, still tile_3 has zero shape
[08/31/2023-14:55:01] [W] --workspace flag has been deprecated by --memPoolSize flag. [08/31/2023-14:55:01] [I] === Model Options === [08/31/2023-14:55:01] [I] Format: ONNX [08/31/2023-14:55:01] [I] Model: ./rtdetr_r50vd_6x_coco.onnx [08/31/2023-14:55:01] [I] Output: [08/31/2023-14:55:01] [I] === Build Options === [08/31/2023-14:55:01] [I] Max batch: explicit batch [08/31/2023-14:55:01] [I] Memory Pools: workspace: 4096 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [08/31/2023-14:55:01] [I] minTiming: 1 [08/31/2023-14:55:01] [I] avgTiming: 8 [08/31/2023-14:55:01] [I] Precision: FP32+FP16 [08/31/2023-14:55:01] [I] LayerPrecisions: [08/31/2023-14:55:01] [I] Layer Device Types: [08/31/2023-14:55:01] [I] Calibration: [08/31/2023-14:55:01] [I] Refit: Disabled [08/31/2023-14:55:01] [I] Version Compatible: Disabled [08/31/2023-14:55:01] [I] TensorRT runtime: full [08/31/2023-14:55:01] [I] Lean DLL Path: [08/31/2023-14:55:01] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [08/31/2023-14:55:01] [I] Exclude Lean Runtime: Disabled [08/31/2023-14:55:01] [I] Sparsity: Disabled [08/31/2023-14:55:01] [I] Safe mode: Disabled [08/31/2023-14:55:01] [I] Build DLA standalone loadable: Disabled [08/31/2023-14:55:01] [I] Allow GPU fallback for DLA: Disabled [08/31/2023-14:55:01] [I] DirectIO mode: Disabled [08/31/2023-14:55:01] [I] Restricted mode: Disabled [08/31/2023-14:55:01] [I] Skip inference: Disabled [08/31/2023-14:55:01] [I] Save engine: rtdetr_r50vd_6x_coco.trt [08/31/2023-14:55:01] [I] Load engine: [08/31/2023-14:55:01] [I] Profiling verbosity: 0 [08/31/2023-14:55:01] [I] Tactic sources: Using default tactic sources [08/31/2023-14:55:01] [I] timingCacheMode: local [08/31/2023-14:55:01] [I] timingCacheFile: [08/31/2023-14:55:01] [I] Heuristic: Disabled [08/31/2023-14:55:01] [I] Preview Features: Use default preview flags. [08/31/2023-14:55:01] [I] MaxAuxStreams: -1 [08/31/2023-14:55:01] [I] BuilderOptimizationLevel: -1 [08/31/2023-14:55:01] [I] Input(s)s format: fp32:CHW [08/31/2023-14:55:01] [I] Output(s)s format: fp32:CHW [08/31/2023-14:55:01] [I] Input build shape: image=1x3x640x640+1x3x640x640+1x3x640x640 [08/31/2023-14:55:01] [I] Input build shape: scale_factor=1x2+1x2+1x2 [08/31/2023-14:55:01] [I] Input build shape: im_shape=1x2+1x2+1x2 [08/31/2023-14:55:01] [I] Input calibration shapes: model [08/31/2023-14:55:01] [I] === System Options === [08/31/2023-14:55:01] [I] Device: 0 [08/31/2023-14:55:01] [I] DLACore: [08/31/2023-14:55:01] [I] Plugins: [08/31/2023-14:55:01] [I] setPluginsToSerialize: [08/31/2023-14:55:01] [I] dynamicPlugins: [08/31/2023-14:55:01] [I] ignoreParsedPluginLibs: 0 [08/31/2023-14:55:01] [I] [08/31/2023-14:55:01] [I] === Inference Options === [08/31/2023-14:55:01] [I] Batch: Explicit [08/31/2023-14:55:01] [I] Input inference shape: im_shape=1x2 [08/31/2023-14:55:01] [I] Input inference shape: scale_factor=1x2 [08/31/2023-14:55:01] [I] Input inference shape: image=1x3x640x640 [08/31/2023-14:55:01] [I] Iterations: 10 [08/31/2023-14:55:01] [I] Duration: 3s (+ 200ms warm up) [08/31/2023-14:55:01] [I] Sleep time: 0ms [08/31/2023-14:55:01] [I] Idle time: 0ms [08/31/2023-14:55:01] [I] Inference Streams: 1 [08/31/2023-14:55:01] [I] ExposeDMA: Disabled [08/31/2023-14:55:01] [I] Data transfers: Enabled [08/31/2023-14:55:01] [I] Spin-wait: Disabled [08/31/2023-14:55:01] [I] Multithreading: Disabled [08/31/2023-14:55:01] [I] CUDA Graph: Disabled [08/31/2023-14:55:01] [I] Separate profiling: Disabled [08/31/2023-14:55:01] [I] Time Deserialize: Disabled [08/31/2023-14:55:01] [I] Time Refit: Disabled [08/31/2023-14:55:01] [I] NVTX verbosity: 0 [08/31/2023-14:55:01] [I] Persistent Cache Ratio: 0 [08/31/2023-14:55:01] [I] Inputs: [08/31/2023-14:55:01] [I] === Reporting Options === [08/31/2023-14:55:01] [I] Verbose: Disabled [08/31/2023-14:55:01] [I] Averages: 10 inferences [08/31/2023-14:55:01] [I] Percentiles: 90,95,99 [08/31/2023-14:55:01] [I] Dump refittable layers:Disabled [08/31/2023-14:55:01] [I] Dump output: Disabled [08/31/2023-14:55:01] [I] Profile: Disabled [08/31/2023-14:55:01] [I] Export timing to JSON file: [08/31/2023-14:55:01] [I] Export output to JSON file: [08/31/2023-14:55:01] [I] Export profile to JSON file: [08/31/2023-14:55:01] [I] [08/31/2023-14:55:01] [I] === Device Information === [08/31/2023-14:55:01] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti [08/31/2023-14:55:01] [I] Compute Capability: 7.5 [08/31/2023-14:55:01] [I] SMs: 68 [08/31/2023-14:55:01] [I] Device Global Memory: 11011 MiB [08/31/2023-14:55:01] [I] Shared Memory per SM: 64 KiB [08/31/2023-14:55:01] [I] Memory Bus Width: 352 bits (ECC disabled) [08/31/2023-14:55:01] [I] Application Compute Clock Rate: 1.65 GHz [08/31/2023-14:55:01] [I] Application Memory Clock Rate: 7 GHz [08/31/2023-14:55:01] [I] [08/31/2023-14:55:01] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [08/31/2023-14:55:01] [I] [08/31/2023-14:55:01] [I] TensorRT version: 8.6.1 [08/31/2023-14:55:01] [I] Loading standard plugins [08/31/2023-14:55:01] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 18, GPU 488 (MiB) [08/31/2023-14:55:06] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +896, GPU +174, now: CPU 991, GPU 662 (MiB) [08/31/2023-14:55:06] [I] Start parsing network model. [08/31/2023-14:55:06] [I] [TRT] ---------------------------------------------------------------- [08/31/2023-14:55:06] [I] [TRT] Input filename: ./rtdetr_r50vd_6x_coco.onnx [08/31/2023-14:55:06] [I] [TRT] ONNX IR version: 0.0.8 [08/31/2023-14:55:06] [I] [TRT] Opset version: 16 [08/31/2023-14:55:06] [I] [TRT] Producer name: [08/31/2023-14:55:06] [I] [TRT] Producer version: [08/31/2023-14:55:06] [I] [TRT] Domain: [08/31/2023-14:55:06] [I] [TRT] Model version: 0 [08/31/2023-14:55:06] [I] [TRT] Doc string: [08/31/2023-14:55:06] [I] [TRT] ---------------------------------------------------------------- [08/31/2023-14:55:06] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [08/31/2023-14:55:07] [I] Finished parsing network model. Parse time: 0.34223 [08/31/2023-14:55:07] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [08/31/2023-14:55:07] [W] [TRT] Detected layernorm nodes in FP16: p2o.Sub.0, p2o.Pow.0, p2o.Add.44, p2o.Sqrt.0, p2o.Div.0, p2o.Mul.2, p2o.Add.46, p2o.Sub.2, p2o.Pow.2, p2o.Add.56, p2o.Sqrt.2, p2o.Div.4, p2o.Mul.7, p2o.Add.58, p2o.Sub.4, p2o.Pow.4, p2o.Add.104, p2o.Sqrt.4, p2o.Div.6, p2o.Mul.57, p2o.Add.106, p2o.Sub.6, p2o.Pow.6, p2o.ReduceMean.14, p2o.Add.134, p2o.Sqrt.6, p2o.Div.8, p2o.Mul.61, p2o.Add.136, p2o.Sub.8, p2o.Pow.8, p2o.ReduceMean.18, p2o.Add.154, p2o.Sqrt.8, p2o.Div.10, p2o.Mul.81, p2o.Add.156, p2o.Sub.10, p2o.Pow.10, p2o.ReduceMean.22, p2o.Add.164, p2o.Sqrt.10, p2o.Div.12, p2o.Mul.83, p2o.Add.166, p2o.Sub.12, p2o.Pow.12, p2o.ReduceMean.26, p2o.Add.194, p2o.Sqrt.12, p2o.Div.16, p2o.Mul.89, p2o.Add.196, p2o.Sub.14, p2o.Pow.14, p2o.ReduceMean.30, p2o.Add.214, p2o.Sqrt.14, p2o.Div.18, p2o.Mul.109, p2o.Add.216, p2o.Sub.16, p2o.Pow.16, p2o.ReduceMean.34, p2o.Add.224, p2o.Sqrt.16, p2o.Div.20, p2o.Mul.111, p2o.Add.226, p2o.Sub.18, p2o.Pow.18, p2o.ReduceMean.38, p2o.Add.254, p2o.Sqrt.18, p2o.Div.24, p2o.Mul.117, p2o.Add.256, p2o.Sub.20, p2o.Pow.20, p2o.ReduceMean.42, p2o.Add.274, p2o.Sqrt.20, p2o.Div.26, p2o.Mul.137, p2o.Add.276, p2o.Sub.22, p2o.Pow.22, p2o.ReduceMean.46, p2o.Add.284, p2o.Sqrt.22, p2o.Div.28, p2o.Mul.139, p2o.Add.286, p2o.Sub.24, p2o.Pow.24, p2o.ReduceMean.50, p2o.Add.314, p2o.Sqrt.24, p2o.Div.32, p2o.Mul.145, p2o.Add.316, p2o.Sub.26, p2o.Pow.26, p2o.ReduceMean.54, p2o.Add.334, p2o.Sqrt.26, p2o.Div.34, p2o.Mul.165, p2o.Add.336, p2o.Sub.28, p2o.Pow.28, p2o.ReduceMean.58, p2o.Add.344, p2o.Sqrt.28, p2o.Div.36, p2o.Mul.167, p2o.Add.346, p2o.Sub.30, p2o.Pow.30, p2o.ReduceMean.62, p2o.Add.374, p2o.Sqrt.30, p2o.Div.40, p2o.Mul.173, p2o.Add.376, p2o.Sub.32, p2o.Pow.32, p2o.ReduceMean.66, p2o.Add.394, p2o.Sqrt.32, p2o.Div.42, p2o.Mul.193, p2o.Add.396, p2o.Sub.34, p2o.Pow.34, p2o.ReduceMean.70, p2o.Add.404, p2o.Sqrt.34, p2o.Div.44, p2o.Mul.195, p2o.Add.406, p2o.Sub.36, p2o.Pow.36, p2o.ReduceMean.74, p2o.Add.434, p2o.Sqrt.36, p2o.Div.48, p2o.Mul.201, p2o.Add.436, p2o.Sub.38, p2o.Pow.38, p2o.ReduceMean.78, p2o.Add.454, p2o.Sqrt.38, p2o.Div.50, p2o.Mul.221, p2o.Add.456, p2o.Sub.40, p2o.Pow.40, p2o.ReduceMean.82, p2o.Add.464, p2o.Sqrt.40, p2o.Div.52, p2o.Mul.223, p2o.Add.466, p2o.ReduceMean.10, p2o.ReduceMean.2, p2o.ReduceMean.6 [08/31/2023-14:55:07] [W] [TRT] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy. [08/31/2023-14:55:07] [I] [TRT] Graph optimization time: 0.403732 seconds. [08/31/2023-14:55:07] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [08/31/2023-14:55:07] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [08/31/2023-15:00:26] [I] [TRT] Detected 3 inputs and 2 output network tensors. [08/31/2023-15:00:27] [I] [TRT] Total Host Persistent Memory: 439056 [08/31/2023-15:00:27] [I] [TRT] Total Device Persistent Memory: 834560 [08/31/2023-15:00:27] [I] [TRT] Total Scratch Memory: 14330880 [08/31/2023-15:00:27] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 172 MiB, GPU 73 MiB [08/31/2023-15:00:27] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 143 steps to complete. [08/31/2023-15:00:27] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 5.89313ms to assign 9 blocks to 143 nodes requiring 34204160 bytes. [08/31/2023-15:00:27] [I] [TRT] Total Activation Memory: 34204160 [08/31/2023-15:00:27] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy. [08/31/2023-15:00:27] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. [08/31/2023-15:00:27] [W] [TRT] Check verbose logs for the list of affected weights. [08/31/2023-15:00:27] [W] [TRT] - 1 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity. [08/31/2023-15:00:27] [W] [TRT] - 223 weights are affected by this issue: Detected subnormal FP16 values. [08/31/2023-15:00:27] [W] [TRT] - 63 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. [08/31/2023-15:00:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +65, GPU +81, now: CPU 65, GPU 81 (MiB) [08/31/2023-15:00:27] [I] Engine built in 326.066 sec. [08/31/2023-15:00:27] [I] [TRT] Loaded engine size: 85 MiB [08/31/2023-15:00:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +81, now: CPU 0, GPU 81 (MiB) [08/31/2023-15:00:27] [I] Engine deserialized in 0.0433515 sec. [08/31/2023-15:00:27] [I] [TRT] [MS] Running engine with multi stream info [08/31/2023-15:00:27] [I] [TRT] [MS] Number of aux streams is 2 [08/31/2023-15:00:27] [I] [TRT] [MS] Number of total worker streams is 3 [08/31/2023-15:00:27] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream [08/31/2023-15:00:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +33, now: CPU 0, GPU 114 (MiB) [08/31/2023-15:00:27] [I] Setting persistentCacheLimit to 0 bytes. [08/31/2023-15:00:27] [I] Using random values for input im_shape [08/31/2023-15:00:27] [I] Input binding for im_shape with dimensions 1x2 is created. [08/31/2023-15:00:27] [I] Using random values for input image [08/31/2023-15:00:28] [I] Input binding for image with dimensions 1x3x640x640 is created. [08/31/2023-15:00:28] [I] Using random values for input scale_factor [08/31/2023-15:00:28] [I] Input binding for scale_factor with dimensions 1x2 is created. [08/31/2023-15:00:28] [I] Output binding for tile_3.tmp_0 with dimensions is created. [08/31/2023-15:00:28] [I] Output binding for reshape2_95.tmp_0 with dimensions 300x6 is created. [08/31/2023-15:00:28] [I] Starting inference [08/31/2023-15:00:31] [I] Warmup completed 45 queries over 200 ms [08/31/2023-15:00:31] [I] Timing trace has 669 queries over 3.01153 s [08/31/2023-15:00:31] [I] [08/31/2023-15:00:31] [I] === Trace details === [08/31/2023-15:00:31] [I] Trace averages of 10 runs: [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.45693 ms - Host latency: 5.00163 ms (enqueue 1.44989 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.45289 ms - Host latency: 4.99629 ms (enqueue 1.27771 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49351 ms - Host latency: 5.03987 ms (enqueue 1.50723 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48447 ms - Host latency: 5.03333 ms (enqueue 1.45529 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.46169 ms - Host latency: 5.00724 ms (enqueue 1.81611 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.46317 ms - Host latency: 5.00313 ms (enqueue 1.16371 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.44719 ms - Host latency: 4.99411 ms (enqueue 1.92591 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47023 ms - Host latency: 5.01824 ms (enqueue 2.03996 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49032 ms - Host latency: 5.03456 ms (enqueue 2.03104 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49633 ms - Host latency: 5.04483 ms (enqueue 1.82703 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48669 ms - Host latency: 5.02964 ms (enqueue 1.8278 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48203 ms - Host latency: 5.02427 ms (enqueue 1.39008 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48318 ms - Host latency: 5.02858 ms (enqueue 1.37376 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48325 ms - Host latency: 5.02468 ms (enqueue 1.27521 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.5146 ms - Host latency: 5.06036 ms (enqueue 1.08528 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51545 ms - Host latency: 5.06132 ms (enqueue 1.78871 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49631 ms - Host latency: 5.04066 ms (enqueue 1.78268 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.4944 ms - Host latency: 5.04268 ms (enqueue 1.63381 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48693 ms - Host latency: 5.02889 ms (enqueue 1.44833 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.4881 ms - Host latency: 5.03709 ms (enqueue 1.91923 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48943 ms - Host latency: 5.03877 ms (enqueue 1.9224 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47816 ms - Host latency: 5.02123 ms (enqueue 1.57825 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48765 ms - Host latency: 5.03352 ms (enqueue 1.05079 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48776 ms - Host latency: 5.02734 ms (enqueue 1.0729 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49819 ms - Host latency: 5.0441 ms (enqueue 1.86301 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.52578 ms - Host latency: 5.07155 ms (enqueue 1.98652 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.5193 ms - Host latency: 5.06476 ms (enqueue 1.92823 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48706 ms - Host latency: 5.03068 ms (enqueue 1.09467 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47623 ms - Host latency: 5.02465 ms (enqueue 1.70804 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.46544 ms - Host latency: 5.00983 ms (enqueue 2.03997 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50546 ms - Host latency: 5.04865 ms (enqueue 2.05879 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51282 ms - Host latency: 5.05753 ms (enqueue 1.87915 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50448 ms - Host latency: 5.04973 ms (enqueue 1.43376 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48164 ms - Host latency: 5.01892 ms (enqueue 1.56187 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47012 ms - Host latency: 5.00725 ms (enqueue 1.41846 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50891 ms - Host latency: 5.05251 ms (enqueue 1.05018 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.5177 ms - Host latency: 5.05718 ms (enqueue 1.60492 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48661 ms - Host latency: 5.022 ms (enqueue 1.25336 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48765 ms - Host latency: 5.0371 ms (enqueue 1.88019 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48152 ms - Host latency: 5.02531 ms (enqueue 2.06493 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51396 ms - Host latency: 5.05807 ms (enqueue 2.06946 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51018 ms - Host latency: 5.06058 ms (enqueue 2.05632 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48259 ms - Host latency: 5.03088 ms (enqueue 1.86101 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47998 ms - Host latency: 5.02954 ms (enqueue 2.07688 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49255 ms - Host latency: 5.03833 ms (enqueue 2.05876 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.52056 ms - Host latency: 5.06689 ms (enqueue 2.0646 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50173 ms - Host latency: 5.04526 ms (enqueue 1.44507 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48713 ms - Host latency: 5.03086 ms (enqueue 1.45964 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47297 ms - Host latency: 5.0217 ms (enqueue 1.98777 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.4876 ms - Host latency: 5.03291 ms (enqueue 1.72166 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50876 ms - Host latency: 5.05713 ms (enqueue 2.04985 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51277 ms - Host latency: 5.05891 ms (enqueue 2.04622 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.47832 ms - Host latency: 5.02549 ms (enqueue 2.03547 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48232 ms - Host latency: 5.02056 ms (enqueue 1.15129 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.46726 ms - Host latency: 5.01082 ms (enqueue 1.07268 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51665 ms - Host latency: 5.06235 ms (enqueue 1.45144 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51802 ms - Host latency: 5.06448 ms (enqueue 2.05554 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49619 ms - Host latency: 5.04353 ms (enqueue 2.02344 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.49478 ms - Host latency: 5.03818 ms (enqueue 1.64333 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48669 ms - Host latency: 5.03503 ms (enqueue 2.04768 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50999 ms - Host latency: 5.0573 ms (enqueue 2.04263 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.51379 ms - Host latency: 5.05249 ms (enqueue 1.92029 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.54653 ms - Host latency: 5.09473 ms (enqueue 2.05488 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.52471 ms - Host latency: 5.07537 ms (enqueue 2.03877 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.50703 ms - Host latency: 5.04866 ms (enqueue 2.06172 ms) [08/31/2023-15:00:31] [I] Average on 10 runs - GPU latency: 4.48103 ms - Host latency: 5.02456 ms (enqueue 2.02048 ms) [08/31/2023-15:00:31] [I] [08/31/2023-15:00:31] [I] === Performance summary === [08/31/2023-15:00:31] [I] Throughput: 222.146 qps [08/31/2023-15:00:31] [I] Latency: min = 4.97229 ms, max = 5.25537 ms, mean = 5.03733 ms, median = 5.03564 ms, percentile(90%) = 5.07141 ms, percentile(95%) = 5.07983 ms, percentile(99%) = 5.10303 ms [08/31/2023-15:00:31] [I] Enqueue Time: min = 0.800537 ms, max = 2.9259 ms, mean = 1.70667 ms, median = 2.02332 ms, percentile(90%) = 2.09106 ms, percentile(95%) = 2.12305 ms, percentile(99%) = 2.23798 ms [08/31/2023-15:00:31] [I] H2D Latency: min = 0.498535 ms, max = 0.581665 ms, mean = 0.537984 ms, median = 0.53833 ms, percentile(90%) = 0.547852 ms, percentile(95%) = 0.551025 ms, percentile(99%) = 0.557373 ms [08/31/2023-15:00:31] [I] GPU Compute Time: min = 4.42865 ms, max = 4.71533 ms, mean = 4.49246 ms, median = 4.48999 ms, percentile(90%) = 4.52344 ms, percentile(95%) = 4.53076 ms, percentile(99%) = 4.55444 ms [08/31/2023-15:00:31] [I] D2H Latency: min = 0.00488281 ms, max = 0.0146484 ms, mean = 0.00689398 ms, median = 0.00683594 ms, percentile(90%) = 0.00805664 ms, percentile(95%) = 0.00842285 ms, percentile(99%) = 0.009552 ms [08/31/2023-15:00:31] [I] Total Host Walltime: 3.01153 s [08/31/2023-15:00:31] [I] Total GPU Compute Time: 3.00545 s [08/31/2023-15:00:31] [I] Explanations of the performance metrics are printed in the verbose logs. [08/31/2023-15:00:31] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8601] # TensorRT-8.6.1.6/bin/trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx --workspace=4096 --shapes=image:1x3x640x640,scale_factor:1x2,im_shape:1x2 --saveEngine=rtdetr_r50vd_6x_coco.trt --avgRuns=10 --fp16
Yes, that should be the output name. It doesn't have a shape, perhaps due to version issues.
You can try onnxslim
to process the onnx file, and check the output has shape
onnxsim rtdetr_r50vd_6x_coco.onnx rtdetr_r50vd_6x_coco_new.onnx --overwrite-input-shape im_shape:1,2 image:1,3,640,640 scale_factor:1,2
I've installed 2.4.2 paddlepaddle and trt now works! As well as onnxsim
(python3 -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
)
Hello I have issue with launching converted trt model.
I did step by step converting default model like this:
Last one was tricky, I've downloaded TensorRT GA archive and built trtexec inside.
Convert Log
For launch I'm using TRTInference class from benchmark directory:
and get error:
So one of shapes is empty tuple
()
.I wonder if you could help me, maybe I am doing something wrong during converting.
P.S. paddlepaddle-gpu==2.5.1.post117