triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.37k stars 1.49k forks source link

TensorRT: batching is unavailable #5479

Closed entn-at closed 1 year ago

entn-at commented 1 year ago

Description I'm trying to run a model (Titanet-large from NeMo) converted to TensorRT in Triton. It has dynamic shapes and was converted with max. batch size of 16; however, Triton first instructed me to set max_batch_size: 0 and now I get a warning that "The specified dimensions in model MODEL_NAME config hints that batching is unavailable".

Below is the output of polygraphy inspect model titanet_large.plan:

[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 2 Engine Input(s) ----
    {audio_signal [dtype=float32, shape=(-1, 80, -1)],
     length [dtype=int32, shape=(-1,)]}

    ---- 2 Engine Output(s) ----
    {logits [dtype=float32, shape=(-1, 16681)],
     embs [dtype=float32, shape=(-1, 192)]}

    ---- Memory ----
    Device Memory: 2261335040 bytes

    ---- 1 Profile(s) (4 Tensor(s) Each) ----
    - Profile: 0
        Tensor: audio_signal          (Input), Index: 0 | Shapes: min=(1, 80, 100), opt=(16, 80, 200), max=(16, 80, 3000)
        Tensor: length                (Input), Index: 1 | Shapes: min=(1,), opt=(16,), max=(16,)
        Tensor: logits               (Output), Index: 2 | Shape: (-1, 16681)
        Tensor: embs                 (Output), Index: 3 | Shape: (-1, 192)

    ---- 356 Layer(s) ----

Below is the model config:

backend: "tensorrt"
version_policy {
  latest {
    num_versions: 1
  }
}
max_batch_size: 0
input [
  {
    name: "length"
    data_type: TYPE_INT32
    dims: [ -1 ]
  },
  {
    name: "audio_signal"
    data_type: TYPE_FP32
    dims: [ -1, 80, -1 ]
  }
]
output [
  {
    name: "logits"
    data_type: TYPE_FP32
    dims: [ -1, 16681 ]
  },
  {
    name: "embs"
    data_type: TYPE_FP32
    dims: [ -1, 192 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]

I exported TitaNet-Large (in NeMo) to ONNX via model.export() and then converted it to a TRT engine (see logs below).

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 22.12 (build 49968236)
NVIDIA TensorRT Version 8.5.1
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

&&&& RUNNING TensorRT.trtexec [TensorRT v8501] # trtexec --dumpLayerInfo --onnx=/models/titanet_large.onnx --minShapes=audio_signal:1x80x100,length:1 --optShapes=audio_signal:16x80x200,length:16 --maxShapes=audio_signal:16x80x3000,length:16 --fp16 --saveEngine=/models/titanet_large.plan
[03/09/2023-21:44:13] [I] === Model Options ===
[03/09/2023-21:44:13] [I] Format: ONNX
[03/09/2023-21:44:13] [I] Model: /models/titanet_large.onnx
[03/09/2023-21:44:13] [I] Output:
[03/09/2023-21:44:13] [I] === Build Options ===
[03/09/2023-21:44:13] [I] Max batch: explicit batch
[03/09/2023-21:44:13] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/09/2023-21:44:13] [I] minTiming: 1
[03/09/2023-21:44:13] [I] avgTiming: 8
[03/09/2023-21:44:13] [I] Precision: FP32+FP16
[03/09/2023-21:44:13] [I] LayerPrecisions: 
[03/09/2023-21:44:13] [I] Calibration: 
[03/09/2023-21:44:13] [I] Refit: Disabled
[03/09/2023-21:44:13] [I] Sparsity: Disabled
[03/09/2023-21:44:13] [I] Safe mode: Disabled
[03/09/2023-21:44:13] [I] DirectIO mode: Disabled
[03/09/2023-21:44:13] [I] Restricted mode: Disabled
[03/09/2023-21:44:13] [I] Build only: Disabled
[03/09/2023-21:44:13] [I] Save engine: /models/titanet_large.plan
[03/09/2023-21:44:13] [I] Load engine: 
[03/09/2023-21:44:13] [I] Profiling verbosity: 0
[03/09/2023-21:44:13] [I] Tactic sources: Using default tactic sources
[03/09/2023-21:44:13] [I] timingCacheMode: local
[03/09/2023-21:44:13] [I] timingCacheFile: 
[03/09/2023-21:44:13] [I] Heuristic: Disabled
[03/09/2023-21:44:13] [I] Preview Features: Use default preview flags.
[03/09/2023-21:44:13] [I] Input(s)s format: fp32:CHW
[03/09/2023-21:44:13] [I] Output(s)s format: fp32:CHW
[03/09/2023-21:44:13] [I] Input build shape: audio_signal=1x80x100+16x80x200+16x80x3000
[03/09/2023-21:44:13] [I] Input build shape: length=1+16+16
[03/09/2023-21:44:13] [I] Input calibration shapes: model
[03/09/2023-21:44:13] [I] === System Options ===
[03/09/2023-21:44:13] [I] Device: 0
[03/09/2023-21:44:13] [I] DLACore: 
[03/09/2023-21:44:13] [I] Plugins:
[03/09/2023-21:44:13] [I] === Inference Options ===
[03/09/2023-21:44:13] [I] Batch: Explicit
[03/09/2023-21:44:13] [I] Input inference shape: length=16
[03/09/2023-21:44:13] [I] Input inference shape: audio_signal=16x80x200
[03/09/2023-21:44:13] [I] Iterations: 10
[03/09/2023-21:44:13] [I] Duration: 3s (+ 200ms warm up)
[03/09/2023-21:44:13] [I] Sleep time: 0ms
[03/09/2023-21:44:13] [I] Idle time: 0ms
[03/09/2023-21:44:13] [I] Streams: 1
[03/09/2023-21:44:13] [I] ExposeDMA: Disabled
[03/09/2023-21:44:13] [I] Data transfers: Enabled
[03/09/2023-21:44:13] [I] Spin-wait: Disabled
[03/09/2023-21:44:13] [I] Multithreading: Disabled
[03/09/2023-21:44:13] [I] CUDA Graph: Disabled
[03/09/2023-21:44:13] [I] Separate profiling: Disabled
[03/09/2023-21:44:13] [I] Time Deserialize: Disabled
[03/09/2023-21:44:13] [I] Time Refit: Disabled
[03/09/2023-21:44:13] [I] NVTX verbosity: 0
[03/09/2023-21:44:13] [I] Persistent Cache Ratio: 0
[03/09/2023-21:44:13] [I] Inputs:
[03/09/2023-21:44:13] [I] === Reporting Options ===
[03/09/2023-21:44:13] [I] Verbose: Disabled
[03/09/2023-21:44:13] [I] Averages: 10 inferences
[03/09/2023-21:44:13] [I] Percentiles: 90,95,99
[03/09/2023-21:44:13] [I] Dump refittable layers:Disabled
[03/09/2023-21:44:13] [I] Dump output: Disabled
[03/09/2023-21:44:13] [I] Profile: Disabled
[03/09/2023-21:44:13] [I] Export timing to JSON file: 
[03/09/2023-21:44:13] [I] Export output to JSON file: 
[03/09/2023-21:44:13] [I] Export profile to JSON file: 
[03/09/2023-21:44:13] [I] 
[03/09/2023-21:44:13] [I] === Device Information ===
[03/09/2023-21:44:13] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[03/09/2023-21:44:13] [I] Compute Capability: 7.5
[03/09/2023-21:44:13] [I] SMs: 68
[03/09/2023-21:44:13] [I] Compute Clock Rate: 1.545 GHz
[03/09/2023-21:44:13] [I] Device Global Memory: 11011 MiB
[03/09/2023-21:44:13] [I] Shared Memory per SM: 64 KiB
[03/09/2023-21:44:13] [I] Memory Bus Width: 352 bits (ECC disabled)
[03/09/2023-21:44:13] [I] Memory Clock Rate: 7 GHz
[03/09/2023-21:44:13] [I] 
[03/09/2023-21:44:13] [I] TensorRT version: 8.5.1
[03/09/2023-21:44:14] [I] [TRT] [MemUsageChange] Init CUDA: CPU +538, GPU +0, now: CPU 551, GPU 317 (MiB)
[03/09/2023-21:44:17] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +339, GPU +76, now: CPU 945, GPU 393 (MiB)
[03/09/2023-21:44:17] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[03/09/2023-21:44:17] [I] Start parsing network model
[03/09/2023-21:44:18] [I] [TRT] ----------------------------------------------------------------
[03/09/2023-21:44:18] [I] [TRT] Input filename:   /models/titanet_large.onnx
[03/09/2023-21:44:18] [I] [TRT] ONNX IR version:  0.0.8
[03/09/2023-21:44:18] [I] [TRT] Opset version:    16
[03/09/2023-21:44:18] [I] [TRT] Producer name:    pytorch
[03/09/2023-21:44:18] [I] [TRT] Producer version: 1.13.1
[03/09/2023-21:44:18] [I] [TRT] Domain:           
[03/09/2023-21:44:18] [I] [TRT] Model version:    0
[03/09/2023-21:44:18] [I] [TRT] Doc string:       
[03/09/2023-21:44:18] [I] [TRT] ----------------------------------------------------------------
[03/09/2023-21:44:18] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/09/2023-21:44:21] [I] Finish parsing network model
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:21] [W] [TRT] Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues.
[03/09/2023-21:44:24] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +771, GPU +192, now: CPU 1857, GPU 591 (MiB)
[03/09/2023-21:44:24] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +235, GPU +54, now: CPU 2092, GPU 645 (MiB)
[03/09/2023-21:44:24] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/09/2023-21:44:49] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:44:49] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:44:49] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:44:49] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:44:53] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:44:53] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:44:53] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:44:53] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:44:58] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:44:58] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:44:58] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:44:58] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:45:43] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:45:43] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:45:43] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:45:43] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:45:46] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:45:46] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:45:46] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:45:46] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:45:49] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:45:49] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:45:49] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:45:49] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:45:55] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:45:55] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:45:55] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:45:55] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:45:57] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:45:57] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:45:57] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:45:57] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:00] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:00] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:00] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:00] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:06] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:06] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:06] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:06] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:09] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:09] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:09] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:09] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:12] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:12] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:12] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:12] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:21] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:21] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:21] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:21] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:28] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:28] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:28] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:28] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:46:35] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:46:35] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:46:35] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:46:35] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:22] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:22] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:22] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:22] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:25] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:25] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:25] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:25] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:29] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:29] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:29] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:29] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:48] [I] [TRT] Total Activation Memory: 22954259456
[03/09/2023-21:47:48] [I] [TRT] Detected 2 inputs and 2 output network tensors.
[03/09/2023-21:47:48] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:48] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:48] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:49] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/09/2023-21:47:49] [W] [TRT]  audio_signal_dynamic_axes_1
[03/09/2023-21:47:49] [W] [TRT]  audio_signal_dynamic_axes_2
[03/09/2023-21:47:49] [W] [TRT]  length_dynamic_axes_1
[03/09/2023-21:47:49] [I] [TRT] Total Host Persistent Memory: 97328
[03/09/2023-21:47:49] [I] [TRT] Total Device Persistent Memory: 276480
[03/09/2023-21:47:49] [I] [TRT] Total Scratch Memory: 884931072
[03/09/2023-21:47:49] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 69 MiB, GPU 7070 MiB
[03/09/2023-21:47:49] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 371 steps to complete.
[03/09/2023-21:47:49] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 47.5347ms to assign 10 blocks to 371 nodes requiring 2261335040 bytes.
[03/09/2023-21:47:49] [I] [TRT] Total Activation Memory: 2261335040
[03/09/2023-21:47:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2799, GPU 903 (MiB)
[03/09/2023-21:47:49] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[03/09/2023-21:47:49] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[03/09/2023-21:47:49] [W] [TRT] Check verbose logs for the list of affected weights.
[03/09/2023-21:47:49] [W] [TRT] - 1 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity.
[03/09/2023-21:47:49] [W] [TRT] - 61 weights are affected by this issue: Detected subnormal FP16 values.
[03/09/2023-21:47:49] [W] [TRT] - 58 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[03/09/2023-21:47:49] [W] [TRT] - 2 weights are affected by this issue: Detected finite FP32 values which would overflow in FP16 and converted them to the closest finite FP16 value.
[03/09/2023-21:47:49] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +50, GPU +49, now: CPU 50, GPU 49 (MiB)
[03/09/2023-21:47:49] [I] Engine built in 216.225 sec.
[03/09/2023-21:47:49] [I] [TRT] Loaded engine size: 50 MiB
[03/09/2023-21:47:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 2354, GPU 817 (MiB)
[03/09/2023-21:47:49] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +48, now: CPU 0, GPU 48 (MiB)
[03/09/2023-21:47:49] [I] Engine deserialized in 0.0315335 sec.
[03/09/2023-21:47:50] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2354, GPU 817 (MiB)
[03/09/2023-21:47:50] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2157, now: CPU 0, GPU 2205 (MiB)
[03/09/2023-21:47:50] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[03/09/2023-21:47:50] [I] Setting persistentCacheLimit to 0 bytes.
[03/09/2023-21:47:50] [I] Using random values for input audio_signal
[03/09/2023-21:47:50] [I] Created input binding for audio_signal with dimensions 16x80x200
[03/09/2023-21:47:50] [I] Using random values for input length
[03/09/2023-21:47:50] [I] Created input binding for length with dimensions 16
[03/09/2023-21:47:50] [I] Using random values for output logits
[03/09/2023-21:47:50] [I] Created output binding for logits with dimensions 16x16681
[03/09/2023-21:47:50] [I] Using random values for output embs
[03/09/2023-21:47:50] [I] Created output binding for embs with dimensions 16x192
[03/09/2023-21:47:50] [I] Layer Information:
[03/09/2023-21:47:50] [I] [TRT] The profiling verbosity was set to ProfilingVerbosity::kLAYER_NAMES_ONLY when the engine was built, so only the layer names will be returned. Rebuild the engine with ProfilingVerbosity::kDETAILED to get more verbose layer information.
[03/09/2023-21:47:50] [I] Layers:
[HostToDeviceCopy 0]
Reformatting CopyNode for Input Tensor 0 to (Unnamed Layer* 5) [Shuffle]
(Unnamed Layer* 5) [Shuffle]
shuffle_between_(Unnamed Layer* 5) [Shuffle]_output_and_/encoder/encoder/encoder.0/mconv.0/conv/Conv
/encoder/encoder/encoder.0/mconv.0/conv/Conv
shuffle_after_(Unnamed Layer* 6) [Convolution]_output
shuffle_between_(Unnamed Layer* 6) [Convolution]_output_and_/encoder/encoder/encoder.0/mconv.1/conv/Conv
/encoder/encoder/encoder.0/mconv.1/conv/Conv
shuffle_after_(Unnamed Layer* 17) [Convolution]_output
{ForeignNode[/encoder/encoder/encoder.0/mconv.3/Unsqueeze_2...(Unnamed Layer* 146) [Shuffle]]}
shuffle_between_(Unnamed Layer* 146) [Shuffle]_output_and_/encoder/encoder/encoder.1/mconv.0/conv/Conv
/encoder/encoder/encoder.1/mconv.0/conv/Conv
shuffle_after_(Unnamed Layer* 147) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 147) [Convolution]_output_and_/encoder/encoder/encoder.1/mconv.1/conv/Conv
shuffle_between_(Unnamed Layer* 147) [Convolution]_output_and_/encoder/encoder/encoder.1/mconv.1/conv/Conv
/encoder/encoder/encoder.1/mconv.1/conv/Conv
Reformatting CopyNode for Input Tensor 0 to shuffle_after_(Unnamed Layer* 158) [Convolution]_output
shuffle_after_(Unnamed Layer* 158) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 388) [Shuffle]_output_and_/encoder/encoder/encoder.1/res.0.0/conv/Conv
shuffle_between_(Unnamed Layer* 388) [Shuffle]_output_and_/encoder/encoder/encoder.1/res.0.0/conv/Conv
/encoder/encoder/encoder.1/res.0.0/conv/Conv
shuffle_after_(Unnamed Layer* 389) [Convolution]_output
(Unnamed Layer* 162) [Shuffle]
/encoder/encoder/encoder.1/mconv.2/Cast
/encoder/encoder/encoder.1/mconv.2/Cast_1
/encoder/encoder/encoder.1/mconv.2/Cast_2
/encoder/encoder/encoder.1/mconv.2/Cast_3
/encoder/encoder/encoder.1/mconv.2/Cast_4
/encoder/encoder/encoder.1/mconv.2/Cast_5
/encoder/encoder/encoder.1/mconv.2/Cast_6
/encoder/encoder/encoder.1/mconv.2/Cast_7
/encoder/encoder/encoder.1/mconv.2/Cast_8
/encoder/encoder/encoder.1/mconv.2/Cast_9
/encoder/encoder/encoder.1/mconv.2/Cast_10
/encoder/encoder/encoder.1/mconv.2/Cast_11
(Unnamed Layer* 180) [Shuffle]
/encoder/encoder/encoder.1/mconv.2/BatchNormalization
(Unnamed Layer* 182) [Shuffle]
/encoder/encoder/encoder.1/mconv.2/Cast_12
/encoder/encoder/encoder.1/mconv.2/Cast_13
/encoder/encoder/encoder.1/mconv.2/Cast_14
/encoder/encoder/encoder.1/mconv.2/Cast_15
/encoder/encoder/encoder.1/mconv.2/Cast_16
/encoder/encoder/encoder.1/mconv.2/Cast_17
/encoder/encoder/encoder.1/mconv.2/Cast_18
/encoder/encoder/encoder.1/mconv.2/Cast_19
/encoder/encoder/encoder.1/mconv.2/Cast_20
/encoder/encoder/encoder.1/mconv.2/Cast_21
/encoder/encoder/encoder.1/mconv.2/Cast_22
/encoder/encoder/encoder.1/mconv.2/Cast_23
PWN(/encoder/encoder/encoder.1/fc.1/Relu)
(Unnamed Layer* 202) [Shuffle]
shuffle_between_(Unnamed Layer* 202) [Shuffle]_output_and_/encoder/encoder/encoder.1/mconv.5/conv/Conv
/encoder/encoder/encoder.1/mconv.5/conv/Conv
shuffle_after_(Unnamed Layer* 203) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 203) [Convolution]_output_and_/encoder/encoder/encoder.1/mconv.6/conv/Conv
shuffle_between_(Unnamed Layer* 203) [Convolution]_output_and_/encoder/encoder/encoder.1/mconv.6/conv/Conv
/encoder/encoder/encoder.1/mconv.6/conv/Conv
Reformatting CopyNode for Input Tensor 0 to shuffle_after_(Unnamed Layer* 214) [Convolution]_output
shuffle_after_(Unnamed Layer* 214) [Convolution]_output
(Unnamed Layer* 218) [Shuffle]
/encoder/encoder/encoder.1/mconv.7/Cast
/encoder/encoder/encoder.1/mconv.7/Cast_1
/encoder/encoder/encoder.1/mconv.7/Cast_2
/encoder/encoder/encoder.1/mconv.7/Cast_3
/encoder/encoder/encoder.1/mconv.7/Cast_4
/encoder/encoder/encoder.1/mconv.7/Cast_5
/encoder/encoder/encoder.1/mconv.7/Cast_6
/encoder/encoder/encoder.1/mconv.7/Cast_7
/encoder/encoder/encoder.1/mconv.7/Cast_8
/encoder/encoder/encoder.1/mconv.7/Cast_9
/encoder/encoder/encoder.1/mconv.7/Cast_10
/encoder/encoder/encoder.1/mconv.7/Cast_11
(Unnamed Layer* 236) [Shuffle]
/encoder/encoder/encoder.1/mconv.7/BatchNormalization
(Unnamed Layer* 238) [Shuffle]
/encoder/encoder/encoder.1/mconv.7/Cast_12
/encoder/encoder/encoder.1/mconv.7/Cast_13
/encoder/encoder/encoder.1/mconv.7/Cast_14
/encoder/encoder/encoder.1/mconv.7/Cast_15
/encoder/encoder/encoder.1/mconv.7/Cast_16
/encoder/encoder/encoder.1/mconv.7/Cast_17
/encoder/encoder/encoder.1/mconv.7/Cast_18
/encoder/encoder/encoder.1/mconv.7/Cast_19
/encoder/encoder/encoder.1/mconv.7/Cast_20
/encoder/encoder/encoder.1/mconv.7/Cast_21
/encoder/encoder/encoder.1/mconv.7/Cast_22
/encoder/encoder/encoder.1/mconv.7/Cast_23
PWN(/encoder/encoder/encoder.1/fc.1_1/Relu)
(Unnamed Layer* 258) [Shuffle]
shuffle_between_(Unnamed Layer* 258) [Shuffle]_output_and_/encoder/encoder/encoder.1/mconv.10/conv/Conv
/encoder/encoder/encoder.1/mconv.10/conv/Conv
shuffle_after_(Unnamed Layer* 259) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 259) [Convolution]_output_and_/encoder/encoder/encoder.1/mconv.11/conv/Conv
shuffle_between_(Unnamed Layer* 259) [Convolution]_output_and_/encoder/encoder/encoder.1/mconv.11/conv/Conv
/encoder/encoder/encoder.1/mconv.11/conv/Conv
shuffle_after_(Unnamed Layer* 270) [Convolution]_output
Reformatting CopyNode for Input Tensor 1 to {ForeignNode[/encoder/encoder/encoder.1/mconv.13/Cast...(Unnamed Layer* 434) [Shuffle]]}
Reformatting CopyNode for Input Tensor 2 to {ForeignNode[/encoder/encoder/encoder.1/mconv.13/Cast...(Unnamed Layer* 434) [Shuffle]]}
{ForeignNode[/encoder/encoder/encoder.1/mconv.13/Cast...(Unnamed Layer* 434) [Shuffle]]}
shuffle_between_(Unnamed Layer* 434) [Shuffle]_output_and_/encoder/encoder/encoder.2/mconv.0/conv/Conv
/encoder/encoder/encoder.2/mconv.0/conv/Conv
shuffle_after_(Unnamed Layer* 435) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 435) [Convolution]_output_and_/encoder/encoder/encoder.2/mconv.1/conv/Conv
shuffle_between_(Unnamed Layer* 435) [Convolution]_output_and_/encoder/encoder/encoder.2/mconv.1/conv/Conv
/encoder/encoder/encoder.2/mconv.1/conv/Conv
Reformatting CopyNode for Input Tensor 0 to shuffle_after_(Unnamed Layer* 446) [Convolution]_output
shuffle_after_(Unnamed Layer* 446) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 676) [Shuffle]_output_and_/encoder/encoder/encoder.2/res.0.0/conv/Conv
shuffle_between_(Unnamed Layer* 676) [Shuffle]_output_and_/encoder/encoder/encoder.2/res.0.0/conv/Conv
/encoder/encoder/encoder.2/res.0.0/conv/Conv
shuffle_after_(Unnamed Layer* 677) [Convolution]_output
(Unnamed Layer* 450) [Shuffle]
/encoder/encoder/encoder.2/mconv.2/Cast
/encoder/encoder/encoder.2/mconv.2/Cast_1
/encoder/encoder/encoder.2/mconv.2/Cast_2
/encoder/encoder/encoder.2/mconv.2/Cast_3
/encoder/encoder/encoder.2/mconv.2/Cast_4
/encoder/encoder/encoder.2/mconv.2/Cast_5
/encoder/encoder/encoder.2/mconv.2/Cast_6
/encoder/encoder/encoder.2/mconv.2/Cast_7
/encoder/encoder/encoder.2/mconv.2/Cast_8
/encoder/encoder/encoder.2/mconv.2/Cast_9
/encoder/encoder/encoder.2/mconv.2/Cast_10
/encoder/encoder/encoder.2/mconv.2/Cast_11
(Unnamed Layer* 468) [Shuffle]
/encoder/encoder/encoder.2/mconv.2/BatchNormalization
(Unnamed Layer* 470) [Shuffle]
/encoder/encoder/encoder.2/mconv.2/Cast_12
/encoder/encoder/encoder.2/mconv.2/Cast_13
/encoder/encoder/encoder.2/mconv.2/Cast_14
/encoder/encoder/encoder.2/mconv.2/Cast_15
/encoder/encoder/encoder.2/mconv.2/Cast_16
/encoder/encoder/encoder.2/mconv.2/Cast_17
/encoder/encoder/encoder.2/mconv.2/Cast_18
/encoder/encoder/encoder.2/mconv.2/Cast_19
/encoder/encoder/encoder.2/mconv.2/Cast_20
/encoder/encoder/encoder.2/mconv.2/Cast_21
/encoder/encoder/encoder.2/mconv.2/Cast_22
/encoder/encoder/encoder.2/mconv.2/Cast_23
Reformatting CopyNode for Input Tensor 0 to PWN(/encoder/encoder/encoder.2/fc.1/Relu)
PWN(/encoder/encoder/encoder.2/fc.1/Relu)
(Unnamed Layer* 490) [Shuffle]
shuffle_between_(Unnamed Layer* 490) [Shuffle]_output_and_/encoder/encoder/encoder.2/mconv.5/conv/Conv
/encoder/encoder/encoder.2/mconv.5/conv/Conv
shuffle_after_(Unnamed Layer* 491) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 491) [Convolution]_output_and_/encoder/encoder/encoder.2/mconv.6/conv/Conv
shuffle_between_(Unnamed Layer* 491) [Convolution]_output_and_/encoder/encoder/encoder.2/mconv.6/conv/Conv
/encoder/encoder/encoder.2/mconv.6/conv/Conv
Reformatting CopyNode for Input Tensor 0 to shuffle_after_(Unnamed Layer* 502) [Convolution]_output
shuffle_after_(Unnamed Layer* 502) [Convolution]_output
(Unnamed Layer* 506) [Shuffle]
/encoder/encoder/encoder.2/mconv.7/Cast
/encoder/encoder/encoder.2/mconv.7/Cast_1
/encoder/encoder/encoder.2/mconv.7/Cast_2
/encoder/encoder/encoder.2/mconv.7/Cast_3
/encoder/encoder/encoder.2/mconv.7/Cast_4
/encoder/encoder/encoder.2/mconv.7/Cast_5
/encoder/encoder/encoder.2/mconv.7/Cast_6
/encoder/encoder/encoder.2/mconv.7/Cast_7
/encoder/encoder/encoder.2/mconv.7/Cast_8
/encoder/encoder/encoder.2/mconv.7/Cast_9
/encoder/encoder/encoder.2/mconv.7/Cast_10
/encoder/encoder/encoder.2/mconv.7/Cast_11
(Unnamed Layer* 524) [Shuffle]
/encoder/encoder/encoder.2/mconv.7/BatchNormalization
(Unnamed Layer* 526) [Shuffle]
/encoder/encoder/encoder.2/mconv.7/Cast_12
/encoder/encoder/encoder.2/mconv.7/Cast_13
/encoder/encoder/encoder.2/mconv.7/Cast_14
/encoder/encoder/encoder.2/mconv.7/Cast_15
/encoder/encoder/encoder.2/mconv.7/Cast_16
/encoder/encoder/encoder.2/mconv.7/Cast_17
/encoder/encoder/encoder.2/mconv.7/Cast_18
/encoder/encoder/encoder.2/mconv.7/Cast_19
/encoder/encoder/encoder.2/mconv.7/Cast_20
/encoder/encoder/encoder.2/mconv.7/Cast_21
/encoder/encoder/encoder.2/mconv.7/Cast_22
/encoder/encoder/encoder.2/mconv.7/Cast_23
PWN(/encoder/encoder/encoder.2/fc.1_1/Relu)
(Unnamed Layer* 546) [Shuffle]
shuffle_between_(Unnamed Layer* 546) [Shuffle]_output_and_/encoder/encoder/encoder.2/mconv.10/conv/Conv
/encoder/encoder/encoder.2/mconv.10/conv/Conv
shuffle_after_(Unnamed Layer* 547) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 547) [Convolution]_output_and_/encoder/encoder/encoder.2/mconv.11/conv/Conv
shuffle_between_(Unnamed Layer* 547) [Convolution]_output_and_/encoder/encoder/encoder.2/mconv.11/conv/Conv
/encoder/encoder/encoder.2/mconv.11/conv/Conv
shuffle_after_(Unnamed Layer* 558) [Convolution]_output
Reformatting CopyNode for Input Tensor 1 to {ForeignNode[/encoder/encoder/encoder.2/mconv.13/Cast...(Unnamed Layer* 722) [Shuffle]]}
Reformatting CopyNode for Input Tensor 2 to {ForeignNode[/encoder/encoder/encoder.2/mconv.13/Cast...(Unnamed Layer* 722) [Shuffle]]}
{ForeignNode[/encoder/encoder/encoder.2/mconv.13/Cast...(Unnamed Layer* 722) [Shuffle]]}
shuffle_between_(Unnamed Layer* 722) [Shuffle]_output_and_/encoder/encoder/encoder.3/mconv.0/conv/Conv
/encoder/encoder/encoder.3/mconv.0/conv/Conv
shuffle_after_(Unnamed Layer* 723) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 723) [Convolution]_output_and_/encoder/encoder/encoder.3/mconv.1/conv/Conv
shuffle_between_(Unnamed Layer* 723) [Convolution]_output_and_/encoder/encoder/encoder.3/mconv.1/conv/Conv
/encoder/encoder/encoder.3/mconv.1/conv/Conv
Reformatting CopyNode for Input Tensor 0 to shuffle_after_(Unnamed Layer* 734) [Convolution]_output
shuffle_after_(Unnamed Layer* 734) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 964) [Shuffle]_output_and_/encoder/encoder/encoder.3/res.0.0/conv/Conv
shuffle_between_(Unnamed Layer* 964) [Shuffle]_output_and_/encoder/encoder/encoder.3/res.0.0/conv/Conv
/encoder/encoder/encoder.3/res.0.0/conv/Conv
shuffle_after_(Unnamed Layer* 965) [Convolution]_output
(Unnamed Layer* 738) [Shuffle]
/encoder/encoder/encoder.3/mconv.2/Cast
/encoder/encoder/encoder.3/mconv.2/Cast_1
/encoder/encoder/encoder.3/mconv.2/Cast_2
/encoder/encoder/encoder.3/mconv.2/Cast_3
/encoder/encoder/encoder.3/mconv.2/Cast_4
/encoder/encoder/encoder.3/mconv.2/Cast_5
/encoder/encoder/encoder.3/mconv.2/Cast_6
/encoder/encoder/encoder.3/mconv.2/Cast_7
/encoder/encoder/encoder.3/mconv.2/Cast_8
/encoder/encoder/encoder.3/mconv.2/Cast_9
/encoder/encoder/encoder.3/mconv.2/Cast_10
/encoder/encoder/encoder.3/mconv.2/Cast_11
(Unnamed Layer* 756) [Shuffle]
/encoder/encoder/encoder.3/mconv.2/BatchNormalization
(Unnamed Layer* 758) [Shuffle]
/encoder/encoder/encoder.3/mconv.2/Cast_12
/encoder/encoder/encoder.3/mconv.2/Cast_13
/encoder/encoder/encoder.3/mconv.2/Cast_14
/encoder/encoder/encoder.3/mconv.2/Cast_15
/encoder/encoder/encoder.3/mconv.2/Cast_16
/encoder/encoder/encoder.3/mconv.2/Cast_17
/encoder/encoder/encoder.3/mconv.2/Cast_18
/encoder/encoder/encoder.3/mconv.2/Cast_19
/encoder/encoder/encoder.3/mconv.2/Cast_20
/encoder/encoder/encoder.3/mconv.2/Cast_21
/encoder/encoder/encoder.3/mconv.2/Cast_22
/encoder/encoder/encoder.3/mconv.2/Cast_23
Reformatting CopyNode for Input Tensor 0 to PWN(/encoder/encoder/encoder.3/fc.1/Relu)
PWN(/encoder/encoder/encoder.3/fc.1/Relu)
(Unnamed Layer* 778) [Shuffle]
shuffle_between_(Unnamed Layer* 778) [Shuffle]_output_and_/encoder/encoder/encoder.3/mconv.5/conv/Conv
/encoder/encoder/encoder.3/mconv.5/conv/Conv
shuffle_after_(Unnamed Layer* 779) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 779) [Convolution]_output_and_/encoder/encoder/encoder.3/mconv.6/conv/Conv
shuffle_between_(Unnamed Layer* 779) [Convolution]_output_and_/encoder/encoder/encoder.3/mconv.6/conv/Conv
/encoder/encoder/encoder.3/mconv.6/conv/Conv
Reformatting CopyNode for Input Tensor 0 to shuffle_after_(Unnamed Layer* 790) [Convolution]_output
shuffle_after_(Unnamed Layer* 790) [Convolution]_output
(Unnamed Layer* 794) [Shuffle]
/encoder/encoder/encoder.3/mconv.7/Cast
/encoder/encoder/encoder.3/mconv.7/Cast_1
/encoder/encoder/encoder.3/mconv.7/Cast_2
/encoder/encoder/encoder.3/mconv.7/Cast_3
/encoder/encoder/encoder.3/mconv.7/Cast_4
/encoder/encoder/encoder.3/mconv.7/Cast_5
/encoder/encoder/encoder.3/mconv.7/Cast_6
/encoder/encoder/encoder.3/mconv.7/Cast_7
/encoder/encoder/encoder.3/mconv.7/Cast_8
/encoder/encoder/encoder.3/mconv.7/Cast_9
/encoder/encoder/encoder.3/mconv.7/Cast_10
/encoder/encoder/encoder.3/mconv.7/Cast_11
(Unnamed Layer* 812) [Shuffle]
/encoder/encoder/encoder.3/mconv.7/BatchNormalization
(Unnamed Layer* 814) [Shuffle]
/encoder/encoder/encoder.3/mconv.7/Cast_12
/encoder/encoder/encoder.3/mconv.7/Cast_13
/encoder/encoder/encoder.3/mconv.7/Cast_14
/encoder/encoder/encoder.3/mconv.7/Cast_15
/encoder/encoder/encoder.3/mconv.7/Cast_16
/encoder/encoder/encoder.3/mconv.7/Cast_17
/encoder/encoder/encoder.3/mconv.7/Cast_18
/encoder/encoder/encoder.3/mconv.7/Cast_19
/encoder/encoder/encoder.3/mconv.7/Cast_20
/encoder/encoder/encoder.3/mconv.7/Cast_21
/encoder/encoder/encoder.3/mconv.7/Cast_22
/encoder/encoder/encoder.3/mconv.7/Cast_23
Reformatting CopyNode for Input Tensor 0 to PWN(/encoder/encoder/encoder.3/fc.1_1/Relu)
PWN(/encoder/encoder/encoder.3/fc.1_1/Relu)
(Unnamed Layer* 834) [Shuffle]
shuffle_between_(Unnamed Layer* 834) [Shuffle]_output_and_/encoder/encoder/encoder.3/mconv.10/conv/Conv
/encoder/encoder/encoder.3/mconv.10/conv/Conv
shuffle_after_(Unnamed Layer* 835) [Convolution]_output
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 835) [Convolution]_output_and_/encoder/encoder/encoder.3/mconv.11/conv/Conv
shuffle_between_(Unnamed Layer* 835) [Convolution]_output_and_/encoder/encoder/encoder.3/mconv.11/conv/Conv
/encoder/encoder/encoder.3/mconv.11/conv/Conv
shuffle_after_(Unnamed Layer* 846) [Convolution]_output
Reformatting CopyNode for Input Tensor 1 to {ForeignNode[/encoder/encoder/encoder.3/mconv.13/Cast...(Unnamed Layer* 1010) [Shuffle]]}
Reformatting CopyNode for Input Tensor 2 to {ForeignNode[/encoder/encoder/encoder.3/mconv.13/Cast...(Unnamed Layer* 1010) [Shuffle]]}
{ForeignNode[/encoder/encoder/encoder.3/mconv.13/Cast...(Unnamed Layer* 1010) [Shuffle]]}
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 1010) [Shuffle]_output_and_/encoder/encoder/encoder.4/mconv.0/conv/Conv
shuffle_between_(Unnamed Layer* 1010) [Shuffle]_output_and_/encoder/encoder/encoder.4/mconv.0/conv/Conv
/encoder/encoder/encoder.4/mconv.0/conv/Conv
shuffle_after_(Unnamed Layer* 1011) [Convolution]_output
shuffle_between_(Unnamed Layer* 1011) [Convolution]_output_and_/encoder/encoder/encoder.4/mconv.1/conv/Conv
/encoder/encoder/encoder.4/mconv.1/conv/Conv
shuffle_after_(Unnamed Layer* 1022) [Convolution]_output
Reformatting CopyNode for Input Tensor 2 to {ForeignNode[/encoder/encoder/encoder.4/mconv.3/Cast.../decoder/_pooling/Unsqueeze_9]}
{ForeignNode[/encoder/encoder/encoder.4/mconv.3/Cast.../decoder/_pooling/Unsqueeze_9]}
/decoder/_pooling/Expand_1
/decoder/_pooling/Expand
/decoder/_pooling/Tile
/decoder/_pooling/Tile_1
/encoder/encoder/encoder.4/mout/fc.1/Relu_output_0 copy
(Unnamed Layer* 1279) [Shuffle]
Reformatting CopyNode for Input Tensor 0 to shuffle_between_(Unnamed Layer* 1279) [Shuffle]_output_and_/decoder/_pooling/attention_layer/attention_layer.0/conv_layer/Conv + /decoder/_pooling/attention_layer/attention_layer.0/activation/Relu
shuffle_between_(Unnamed Layer* 1279) [Shuffle]_output_and_/decoder/_pooling/attention_layer/attention_layer.0/conv_layer/Conv + /decoder/_pooling/attention_layer/attention_layer.0/activation/Relu
/decoder/_pooling/attention_layer/attention_layer.0/conv_layer/Conv + /decoder/_pooling/attention_layer/attention_layer.0/activation/Relu
Reformatting CopyNode for Input Tensor 0 to shuffle_after_/decoder/_pooling/attention_layer/attention_layer.0/activation/Relu_out_tensor
shuffle_after_/decoder/_pooling/attention_layer/attention_layer.0/activation/Relu_out_tensor
squeeze_after_/decoder/_pooling/attention_layer/attention_layer.0/activation/Relu
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_1
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_2
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_3
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_4
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_5
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_6
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_7
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_8
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_9
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_10
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_11
(Unnamed Layer* 1303) [Shuffle]
/decoder/_pooling/attention_layer/attention_layer.0/bn/BatchNormalization
(Unnamed Layer* 1305) [Shuffle]
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_12
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_13
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_14
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_15
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_16
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_17
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_18
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_19
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_20
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_21
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_22
/decoder/_pooling/attention_layer/attention_layer.0/bn/Cast_23
Reformatting CopyNode for Input Tensor 0 to PWN(/decoder/_pooling/attention_layer/attention_layer.1/Tanh)
PWN(/decoder/_pooling/attention_layer/attention_layer.1/Tanh)
(Unnamed Layer* 1325) [Shuffle]
shuffle_between_(Unnamed Layer* 1325) [Shuffle]_output_and_/decoder/_pooling/attention_layer/attention_layer.2/Conv
/decoder/_pooling/attention_layer/attention_layer.2/Conv
shuffle_after_(Unnamed Layer* 1326) [Convolution]_output
{ForeignNode[/decoder/_pooling/Cast_5...(Unnamed Layer* 1452) [Shuffle]]}
Reformatting CopyNode for Input Tensor 0 to /decoder/emb_layers.0.1/Conv
/decoder/emb_layers.0.1/Conv
Reformatting CopyNode for Input Tensor 0 to (Unnamed Layer* 1457) [Shuffle] + /decoder/Squeeze_1
(Unnamed Layer* 1457) [Shuffle] + /decoder/Squeeze_1
Reformatting CopyNode for Output Tensor 0 to (Unnamed Layer* 1457) [Shuffle] + /decoder/Squeeze_1
Reformatting CopyNode for Input Tensor 0 to /decoder/emb_layers.0/emb_layers.0.1/Conv
/decoder/emb_layers.0/emb_layers.0.1/Conv
Reformatting CopyNode for Input Tensor 0 to (Unnamed Layer* 1413) [Shuffle] + /decoder/Squeeze
(Unnamed Layer* 1413) [Shuffle] + /decoder/Squeeze
(Unnamed Layer* 1462) [ElementWise] + /decoder/ReduceL2 + /decoder/ReduceL2_339
PWN(PWN(/decoder/Constant_1_output_0 + (Unnamed Layer* 1467) [Shuffle], (Unnamed Layer* 1469) [ElementWise]), PWN((Unnamed Layer* 1466) [Constant] + (Unnamed Layer* 1468) [Shuffle], (Unnamed Layer* 1470) [ElementWise]))
/decoder/Expand
/decoder/Div
reshape_before_/decoder/final/MatMul
Reformatting CopyNode for Input Tensor 0 to /decoder/final/MatMul
/decoder/final/MatMul
Reformatting CopyNode for Input Tensor 0 to reshape_after_/decoder/final/MatMul
reshape_after_/decoder/final/MatMul

Bindings:
audio_signal
length
logits
embs
[03/09/2023-21:47:50] [I] Starting inference
[03/09/2023-21:47:53] [I] Warmup completed 30 queries over 200 ms
[03/09/2023-21:47:53] [I] Timing trace has 422 queries over 3.02302 s
[03/09/2023-21:47:53] [I] 
[03/09/2023-21:47:53] [I] === Trace details ===
[03/09/2023-21:47:53] [I] Trace averages of 10 runs:
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.73866 ms - Host latency: 7.23662 ms (enqueue 3.13238 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.74574 ms - Host latency: 7.24913 ms (enqueue 3.24203 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.70132 ms - Host latency: 7.2118 ms (enqueue 3.3023 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.72625 ms - Host latency: 7.24046 ms (enqueue 3.18077 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.73943 ms - Host latency: 7.24821 ms (enqueue 3.29568 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.72515 ms - Host latency: 7.23874 ms (enqueue 3.2769 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.71273 ms - Host latency: 7.22096 ms (enqueue 3.28856 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.73475 ms - Host latency: 7.23982 ms (enqueue 2.94738 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.71179 ms - Host latency: 7.22067 ms (enqueue 3.22761 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.74106 ms - Host latency: 7.25438 ms (enqueue 3.1741 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.81032 ms - Host latency: 7.3145 ms (enqueue 3.24466 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.78872 ms - Host latency: 7.29268 ms (enqueue 3.23721 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.7771 ms - Host latency: 7.27214 ms (enqueue 3.23876 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.76962 ms - Host latency: 7.27153 ms (enqueue 3.20966 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.7485 ms - Host latency: 7.24775 ms (enqueue 3.05645 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.74655 ms - Host latency: 7.2478 ms (enqueue 3.23301 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.754 ms - Host latency: 7.26526 ms (enqueue 3.17219 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.76714 ms - Host latency: 7.27935 ms (enqueue 3.16182 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.75376 ms - Host latency: 7.25601 ms (enqueue 2.78651 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.73131 ms - Host latency: 7.23961 ms (enqueue 3.23243 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.77465 ms - Host latency: 7.29276 ms (enqueue 3.17722 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.76602 ms - Host latency: 7.29001 ms (enqueue 3.11594 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.74666 ms - Host latency: 7.2553 ms (enqueue 2.68644 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.76755 ms - Host latency: 7.2787 ms (enqueue 2.8905 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.77473 ms - Host latency: 7.2692 ms (enqueue 2.40334 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.75836 ms - Host latency: 7.25934 ms (enqueue 2.40823 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.91326 ms - Host latency: 7.42234 ms (enqueue 2.75349 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 7.58937 ms - Host latency: 8.06487 ms (enqueue 1.96571 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 13.4604 ms - Host latency: 13.9285 ms (enqueue 1.78154 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 7.23994 ms - Host latency: 7.7283 ms (enqueue 1.78391 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.75183 ms - Host latency: 7.23152 ms (enqueue 1.8488 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.78589 ms - Host latency: 7.2833 ms (enqueue 1.78198 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.75793 ms - Host latency: 7.2696 ms (enqueue 3.19187 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.81602 ms - Host latency: 7.31887 ms (enqueue 2.37625 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.79727 ms - Host latency: 7.2991 ms (enqueue 2.51536 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.77236 ms - Host latency: 7.29175 ms (enqueue 3.03672 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 7.19841 ms - Host latency: 7.70371 ms (enqueue 2.71545 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 8.29995 ms - Host latency: 8.7946 ms (enqueue 2.22634 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 12.418 ms - Host latency: 12.8837 ms (enqueue 1.78894 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 7.27927 ms - Host latency: 7.76296 ms (enqueue 2.04956 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.82854 ms - Host latency: 7.30779 ms (enqueue 1.86838 ms)
[03/09/2023-21:47:53] [I] Average on 10 runs - GPU latency: 6.76965 ms - Host latency: 7.25386 ms (enqueue 1.98186 ms)
[03/09/2023-21:47:53] [I] 
[03/09/2023-21:47:53] [I] === Performance summary ===
[03/09/2023-21:47:53] [I] Throughput: 139.596 qps
[03/09/2023-21:47:53] [I] Latency: min = 7.06567 ms, max = 21.1899 ms, mean = 7.64636 ms, median = 7.26294 ms, percentile(90%) = 7.35291 ms, percentile(95%) = 8.26929 ms, percentile(99%) = 18.9893 ms
[03/09/2023-21:47:53] [I] Enqueue Time: min = 1.76294 ms, max = 3.59631 ms, mean = 2.76133 ms, median = 3.07278 ms, percentile(90%) = 3.37244 ms, percentile(95%) = 3.42175 ms, percentile(99%) = 3.50586 ms
[03/09/2023-21:47:53] [I] H2D Latency: min = 0.218262 ms, max = 0.423096 ms, mean = 0.296416 ms, median = 0.294647 ms, percentile(90%) = 0.315308 ms, percentile(95%) = 0.321777 ms, percentile(99%) = 0.343262 ms
[03/09/2023-21:47:53] [I] GPU Compute Time: min = 6.58643 ms, max = 20.7593 ms, mean = 7.14528 ms, median = 6.75989 ms, percentile(90%) = 6.85571 ms, percentile(95%) = 7.77881 ms, percentile(99%) = 18.5413 ms
[03/09/2023-21:47:53] [I] D2H Latency: min = 0.177734 ms, max = 0.234009 ms, mean = 0.204667 ms, median = 0.203613 ms, percentile(90%) = 0.215515 ms, percentile(95%) = 0.217041 ms, percentile(99%) = 0.219727 ms
[03/09/2023-21:47:53] [I] Total Host Walltime: 3.02302 s
[03/09/2023-21:47:53] [I] Total GPU Compute Time: 3.01531 s
[03/09/2023-21:47:53] [W] * GPU compute time is unstable, with coefficient of variance = 26.8703%.
[03/09/2023-21:47:53] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[03/09/2023-21:47:53] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/09/2023-21:47:53] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --dumpLayerInfo --onnx=/models/titanet_large.onnx --minShapes=audio_signal:1x80x100,length:1 --optShapes=audio_signal:16x80x200,length:16 --maxShapes=audio_signal:16x80x3000,length:16 --fp16 --saveEngine=/models/titanet_large.plan

Triton Information I'm using Triton 22.12, using the official Triton container.

Expected behavior I expected to be able to set max_batch_size to 16 and use batching.

dyastremsky commented 1 year ago

The warning you are receiving is because your config has a max_batch_size of 0, so it is saying that batching is unavailable.

What are the verbose logs (--log-verbose 1) when you try setting the max_batch_size to 16. You'd need to remove the first variable dimension for all your inputs/outputs, since the batching dimension will be your first dimension. And for your [-1] input "length", you'll need to use the reshape field.

entn-at commented 1 year ago

Many thanks! It was indeed a configuration issue, specifically the superfluous first variable dimension -1. Together with reshape, it's working as expected.