Closed lightyfr closed 1 year ago
While trying to convert my Onnx files to engine files it just stops at a certain point and is just forever stuck there. This output was the last I got:root@6f2b8ede7bd1:/workspace/tensorrt# trtexec --fp16 --onnx=4x_fatal_Anime_500000_G.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=fatalESRGAN4x.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly &&&& RUNNING TensorRT.trtexec [TensorRT v8503] # trtexec --fp16 --onnx=4x_fatal_Anime_500000_G.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=fatalESRGAN4x.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly [03/20/2023-21:23:17] [I] === Model Options === [03/20/2023-21:23:17] [I] Format: ONNX [03/20/2023-21:23:17] [I] Model: 4x_fatal_Anime_500000_G.onnx [03/20/2023-21:23:17] [I] Output: [03/20/2023-21:23:17] [I] === Build Options === [03/20/2023-21:23:17] [I] Max batch: explicit batch [03/20/2023-21:23:17] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [03/20/2023-21:23:17] [I] minTiming: 1 [03/20/2023-21:23:17] [I] avgTiming: 8 [03/20/2023-21:23:17] [I] Precision: FP32+FP16 [03/20/2023-21:23:17] [I] LayerPrecisions: [03/20/2023-21:23:17] [I] Calibration: [03/20/2023-21:23:17] [I] Refit: Disabled [03/20/2023-21:23:17] [I] Sparsity: Disabled [03/20/2023-21:23:17] [I] Safe mode: Disabled [03/20/2023-21:23:17] [I] DirectIO mode: Disabled [03/20/2023-21:23:17] [I] Restricted mode: Disabled [03/20/2023-21:23:17] [I] Build only: Enabled [03/20/2023-21:23:17] [I] Save engine: fatalESRGAN4x.engine [03/20/2023-21:23:17] [I] Load engine: [03/20/2023-21:23:17] [I] Profiling verbosity: 0 [03/20/2023-21:23:17] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [ON], [03/20/2023-21:23:17] [I] timingCacheMode: local [03/20/2023-21:23:17] [I] timingCacheFile: [03/20/2023-21:23:17] [I] Heuristic: Disabled [03/20/2023-21:23:17] [I] Preview Features: Use default preview flags. [03/20/2023-21:23:17] [I] Input(s)s format: fp32:CHW [03/20/2023-21:23:17] [I] Output(s)s format: fp32:CHW [03/20/2023-21:23:17] [I] Input build shape: input=1x3x8x8+1x3x720x1280+1x3x1080x1920 [03/20/2023-21:23:17] [I] Input calibration shapes: model [03/20/2023-21:23:17] [I] === System Options === [03/20/2023-21:23:17] [I] Device: 0 [03/20/2023-21:23:17] [I] DLACore: [03/20/2023-21:23:17] [I] Plugins: [03/20/2023-21:23:17] [I] === Inference Options === [03/20/2023-21:23:17] [I] Batch: Explicit [03/20/2023-21:23:17] [I] Input inference shape: input=1x3x720x1280 [03/20/2023-21:23:17] [I] Iterations: 10 [03/20/2023-21:23:17] [I] Duration: 3s (+ 200ms warm up) [03/20/2023-21:23:17] [I] Sleep time: 0ms [03/20/2023-21:23:17] [I] Idle time: 0ms [03/20/2023-21:23:17] [I] Streams: 1 [03/20/2023-21:23:17] [I] ExposeDMA: Disabled [03/20/2023-21:23:17] [I] Data transfers: Enabled [03/20/2023-21:23:17] [I] Spin-wait: Disabled [03/20/2023-21:23:17] [I] Multithreading: Disabled [03/20/2023-21:23:17] [I] CUDA Graph: Disabled [03/20/2023-21:23:17] [I] Separate profiling: Disabled [03/20/2023-21:23:17] [I] Time Deserialize: Disabled [03/20/2023-21:23:17] [I] Time Refit: Disabled [03/20/2023-21:23:17] [I] NVTX verbosity: 0 [03/20/2023-21:23:17] [I] Persistent Cache Ratio: 0 [03/20/2023-21:23:17] [I] Inputs: [03/20/2023-21:23:17] [I] === Reporting Options === [03/20/2023-21:23:17] [I] Verbose: Disabled [03/20/2023-21:23:17] [I] Averages: 10 inferences [03/20/2023-21:23:17] [I] Percentiles: 90,95,99 [03/20/2023-21:23:17] [I] Dump refittable layers:Disabled [03/20/2023-21:23:17] [I] Dump output: Disabled [03/20/2023-21:23:17] [I] Profile: Disabled [03/20/2023-21:23:17] [I] Export timing to JSON file: [03/20/2023-21:23:17] [I] Export output to JSON file: [03/20/2023-21:23:17] [I] Export profile to JSON file: [03/20/2023-21:23:17] [I] [03/20/2023-21:23:18] [I] === Device Information === [03/20/2023-21:23:18] [I] Selected Device: NVIDIA GeForce RTX 3070 [03/20/2023-21:23:18] [I] Compute Capability: 8.6 [03/20/2023-21:23:18] [I] SMs: 46 [03/20/2023-21:23:18] [I] Compute Clock Rate: 1.725 GHz [03/20/2023-21:23:18] [I] Device Global Memory: 8191 MiB [03/20/2023-21:23:18] [I] Shared Memory per SM: 100 KiB [03/20/2023-21:23:18] [I] Memory Bus Width: 256 bits (ECC disabled) [03/20/2023-21:23:18] [I] Memory Clock Rate: 7.001 GHz [03/20/2023-21:23:18] [I] [03/20/2023-21:23:18] [I] TensorRT version: 8.5.3 [03/20/2023-21:23:18] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 26, GPU 1077 (MiB) [03/20/2023-21:23:20] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +546, GPU +118, now: CPU 627, GPU 1195 (MiB) [03/20/2023-21:23:20] [I] Start parsing network model [03/20/2023-21:23:23] [I] [TRT] ---------------------------------------------------------------- [03/20/2023-21:23:23] [I] [TRT] Input filename: 4x_fatal_Anime_500000_G.onnx [03/20/2023-21:23:23] [I] [TRT] ONNX IR version: 0.0.7 [03/20/2023-21:23:23] [I] [TRT] Opset version: 14 [03/20/2023-21:23:23] [I] [TRT] Producer name: pytorch [03/20/2023-21:23:23] [I] [TRT] Producer version: 1.10 [03/20/2023-21:23:23] [I] [TRT] Domain: [03/20/2023-21:23:23] [I] [TRT] Model version: 0 [03/20/2023-21:23:23] [I] [TRT] Doc string: [03/20/2023-21:23:23] [I] [TRT] ---------------------------------------------------------------- [03/20/2023-21:23:24] [I] Finish parsing network model [03/20/2023-21:23:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +10, GPU +10, now: CPU 739, GPU 1205 (MiB) [03/20/2023-21:23:25] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
root@6f2b8ede7bd1:/workspace/tensorrt# trtexec --fp16 --onnx=4x_fatal_Anime_500000_G.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=fatalESRGAN4x.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly &&&& RUNNING TensorRT.trtexec [TensorRT v8503] # trtexec --fp16 --onnx=4x_fatal_Anime_500000_G.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=fatalESRGAN4x.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly [03/20/2023-21:23:17] [I] === Model Options === [03/20/2023-21:23:17] [I] Format: ONNX [03/20/2023-21:23:17] [I] Model: 4x_fatal_Anime_500000_G.onnx [03/20/2023-21:23:17] [I] Output: [03/20/2023-21:23:17] [I] === Build Options === [03/20/2023-21:23:17] [I] Max batch: explicit batch [03/20/2023-21:23:17] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [03/20/2023-21:23:17] [I] minTiming: 1 [03/20/2023-21:23:17] [I] avgTiming: 8 [03/20/2023-21:23:17] [I] Precision: FP32+FP16 [03/20/2023-21:23:17] [I] LayerPrecisions: [03/20/2023-21:23:17] [I] Calibration: [03/20/2023-21:23:17] [I] Refit: Disabled [03/20/2023-21:23:17] [I] Sparsity: Disabled [03/20/2023-21:23:17] [I] Safe mode: Disabled [03/20/2023-21:23:17] [I] DirectIO mode: Disabled [03/20/2023-21:23:17] [I] Restricted mode: Disabled [03/20/2023-21:23:17] [I] Build only: Enabled [03/20/2023-21:23:17] [I] Save engine: fatalESRGAN4x.engine [03/20/2023-21:23:17] [I] Load engine: [03/20/2023-21:23:17] [I] Profiling verbosity: 0 [03/20/2023-21:23:17] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [ON], [03/20/2023-21:23:17] [I] timingCacheMode: local [03/20/2023-21:23:17] [I] timingCacheFile: [03/20/2023-21:23:17] [I] Heuristic: Disabled [03/20/2023-21:23:17] [I] Preview Features: Use default preview flags. [03/20/2023-21:23:17] [I] Input(s)s format: fp32:CHW [03/20/2023-21:23:17] [I] Output(s)s format: fp32:CHW [03/20/2023-21:23:17] [I] Input build shape: input=1x3x8x8+1x3x720x1280+1x3x1080x1920 [03/20/2023-21:23:17] [I] Input calibration shapes: model [03/20/2023-21:23:17] [I] === System Options === [03/20/2023-21:23:17] [I] Device: 0 [03/20/2023-21:23:17] [I] DLACore: [03/20/2023-21:23:17] [I] Plugins: [03/20/2023-21:23:17] [I] === Inference Options === [03/20/2023-21:23:17] [I] Batch: Explicit [03/20/2023-21:23:17] [I] Input inference shape: input=1x3x720x1280 [03/20/2023-21:23:17] [I] Iterations: 10 [03/20/2023-21:23:17] [I] Duration: 3s (+ 200ms warm up) [03/20/2023-21:23:17] [I] Sleep time: 0ms [03/20/2023-21:23:17] [I] Idle time: 0ms [03/20/2023-21:23:17] [I] Streams: 1 [03/20/2023-21:23:17] [I] ExposeDMA: Disabled [03/20/2023-21:23:17] [I] Data transfers: Enabled [03/20/2023-21:23:17] [I] Spin-wait: Disabled [03/20/2023-21:23:17] [I] Multithreading: Disabled [03/20/2023-21:23:17] [I] CUDA Graph: Disabled [03/20/2023-21:23:17] [I] Separate profiling: Disabled [03/20/2023-21:23:17] [I] Time Deserialize: Disabled [03/20/2023-21:23:17] [I] Time Refit: Disabled [03/20/2023-21:23:17] [I] NVTX verbosity: 0 [03/20/2023-21:23:17] [I] Persistent Cache Ratio: 0 [03/20/2023-21:23:17] [I] Inputs: [03/20/2023-21:23:17] [I] === Reporting Options === [03/20/2023-21:23:17] [I] Verbose: Disabled [03/20/2023-21:23:17] [I] Averages: 10 inferences [03/20/2023-21:23:17] [I] Percentiles: 90,95,99 [03/20/2023-21:23:17] [I] Dump refittable layers:Disabled [03/20/2023-21:23:17] [I] Dump output: Disabled [03/20/2023-21:23:17] [I] Profile: Disabled [03/20/2023-21:23:17] [I] Export timing to JSON file: [03/20/2023-21:23:17] [I] Export output to JSON file: [03/20/2023-21:23:17] [I] Export profile to JSON file: [03/20/2023-21:23:17] [I] [03/20/2023-21:23:18] [I] === Device Information === [03/20/2023-21:23:18] [I] Selected Device: NVIDIA GeForce RTX 3070 [03/20/2023-21:23:18] [I] Compute Capability: 8.6 [03/20/2023-21:23:18] [I] SMs: 46 [03/20/2023-21:23:18] [I] Compute Clock Rate: 1.725 GHz [03/20/2023-21:23:18] [I] Device Global Memory: 8191 MiB [03/20/2023-21:23:18] [I] Shared Memory per SM: 100 KiB [03/20/2023-21:23:18] [I] Memory Bus Width: 256 bits (ECC disabled) [03/20/2023-21:23:18] [I] Memory Clock Rate: 7.001 GHz [03/20/2023-21:23:18] [I] [03/20/2023-21:23:18] [I] TensorRT version: 8.5.3 [03/20/2023-21:23:18] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 26, GPU 1077 (MiB) [03/20/2023-21:23:20] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +546, GPU +118, now: CPU 627, GPU 1195 (MiB) [03/20/2023-21:23:20] [I] Start parsing network model [03/20/2023-21:23:23] [I] [TRT] ---------------------------------------------------------------- [03/20/2023-21:23:23] [I] [TRT] Input filename: 4x_fatal_Anime_500000_G.onnx [03/20/2023-21:23:23] [I] [TRT] ONNX IR version: 0.0.7 [03/20/2023-21:23:23] [I] [TRT] Opset version: 14 [03/20/2023-21:23:23] [I] [TRT] Producer name: pytorch [03/20/2023-21:23:23] [I] [TRT] Producer version: 1.10 [03/20/2023-21:23:23] [I] [TRT] Domain: [03/20/2023-21:23:23] [I] [TRT] Model version: 0 [03/20/2023-21:23:23] [I] [TRT] Doc string: [03/20/2023-21:23:23] [I] [TRT] ---------------------------------------------------------------- [03/20/2023-21:23:24] [I] Finish parsing network model [03/20/2023-21:23:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +10, GPU +10, now: CPU 739, GPU 1205 (MiB) [03/20/2023-21:23:25] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
Sometimes it can take hours, sometimes you can just get it working by restarting. I remember esrgan taking very long. I am not sure what else to say since I can not reproduce random hanging.
While trying to convert my Onnx files to engine files it just stops at a certain point and is just forever stuck there. This output was the last I got:
root@6f2b8ede7bd1:/workspace/tensorrt# trtexec --fp16 --onnx=4x_fatal_Anime_500000_G.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=fatalESRGAN4x.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly &&&& RUNNING TensorRT.trtexec [TensorRT v8503] # trtexec --fp16 --onnx=4x_fatal_Anime_500000_G.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=fatalESRGAN4x.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly [03/20/2023-21:23:17] [I] === Model Options === [03/20/2023-21:23:17] [I] Format: ONNX [03/20/2023-21:23:17] [I] Model: 4x_fatal_Anime_500000_G.onnx [03/20/2023-21:23:17] [I] Output: [03/20/2023-21:23:17] [I] === Build Options === [03/20/2023-21:23:17] [I] Max batch: explicit batch [03/20/2023-21:23:17] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [03/20/2023-21:23:17] [I] minTiming: 1 [03/20/2023-21:23:17] [I] avgTiming: 8 [03/20/2023-21:23:17] [I] Precision: FP32+FP16 [03/20/2023-21:23:17] [I] LayerPrecisions: [03/20/2023-21:23:17] [I] Calibration: [03/20/2023-21:23:17] [I] Refit: Disabled [03/20/2023-21:23:17] [I] Sparsity: Disabled [03/20/2023-21:23:17] [I] Safe mode: Disabled [03/20/2023-21:23:17] [I] DirectIO mode: Disabled [03/20/2023-21:23:17] [I] Restricted mode: Disabled [03/20/2023-21:23:17] [I] Build only: Enabled [03/20/2023-21:23:17] [I] Save engine: fatalESRGAN4x.engine [03/20/2023-21:23:17] [I] Load engine: [03/20/2023-21:23:17] [I] Profiling verbosity: 0 [03/20/2023-21:23:17] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [ON], [03/20/2023-21:23:17] [I] timingCacheMode: local [03/20/2023-21:23:17] [I] timingCacheFile: [03/20/2023-21:23:17] [I] Heuristic: Disabled [03/20/2023-21:23:17] [I] Preview Features: Use default preview flags. [03/20/2023-21:23:17] [I] Input(s)s format: fp32:CHW [03/20/2023-21:23:17] [I] Output(s)s format: fp32:CHW [03/20/2023-21:23:17] [I] Input build shape: input=1x3x8x8+1x3x720x1280+1x3x1080x1920 [03/20/2023-21:23:17] [I] Input calibration shapes: model [03/20/2023-21:23:17] [I] === System Options === [03/20/2023-21:23:17] [I] Device: 0 [03/20/2023-21:23:17] [I] DLACore: [03/20/2023-21:23:17] [I] Plugins: [03/20/2023-21:23:17] [I] === Inference Options === [03/20/2023-21:23:17] [I] Batch: Explicit [03/20/2023-21:23:17] [I] Input inference shape: input=1x3x720x1280 [03/20/2023-21:23:17] [I] Iterations: 10 [03/20/2023-21:23:17] [I] Duration: 3s (+ 200ms warm up) [03/20/2023-21:23:17] [I] Sleep time: 0ms [03/20/2023-21:23:17] [I] Idle time: 0ms [03/20/2023-21:23:17] [I] Streams: 1 [03/20/2023-21:23:17] [I] ExposeDMA: Disabled [03/20/2023-21:23:17] [I] Data transfers: Enabled [03/20/2023-21:23:17] [I] Spin-wait: Disabled [03/20/2023-21:23:17] [I] Multithreading: Disabled [03/20/2023-21:23:17] [I] CUDA Graph: Disabled [03/20/2023-21:23:17] [I] Separate profiling: Disabled [03/20/2023-21:23:17] [I] Time Deserialize: Disabled [03/20/2023-21:23:17] [I] Time Refit: Disabled [03/20/2023-21:23:17] [I] NVTX verbosity: 0 [03/20/2023-21:23:17] [I] Persistent Cache Ratio: 0 [03/20/2023-21:23:17] [I] Inputs: [03/20/2023-21:23:17] [I] === Reporting Options === [03/20/2023-21:23:17] [I] Verbose: Disabled [03/20/2023-21:23:17] [I] Averages: 10 inferences [03/20/2023-21:23:17] [I] Percentiles: 90,95,99 [03/20/2023-21:23:17] [I] Dump refittable layers:Disabled [03/20/2023-21:23:17] [I] Dump output: Disabled [03/20/2023-21:23:17] [I] Profile: Disabled [03/20/2023-21:23:17] [I] Export timing to JSON file: [03/20/2023-21:23:17] [I] Export output to JSON file: [03/20/2023-21:23:17] [I] Export profile to JSON file: [03/20/2023-21:23:17] [I] [03/20/2023-21:23:18] [I] === Device Information === [03/20/2023-21:23:18] [I] Selected Device: NVIDIA GeForce RTX 3070 [03/20/2023-21:23:18] [I] Compute Capability: 8.6 [03/20/2023-21:23:18] [I] SMs: 46 [03/20/2023-21:23:18] [I] Compute Clock Rate: 1.725 GHz [03/20/2023-21:23:18] [I] Device Global Memory: 8191 MiB [03/20/2023-21:23:18] [I] Shared Memory per SM: 100 KiB [03/20/2023-21:23:18] [I] Memory Bus Width: 256 bits (ECC disabled) [03/20/2023-21:23:18] [I] Memory Clock Rate: 7.001 GHz [03/20/2023-21:23:18] [I] [03/20/2023-21:23:18] [I] TensorRT version: 8.5.3 [03/20/2023-21:23:18] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 26, GPU 1077 (MiB) [03/20/2023-21:23:20] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +546, GPU +118, now: CPU 627, GPU 1195 (MiB) [03/20/2023-21:23:20] [I] Start parsing network model [03/20/2023-21:23:23] [I] [TRT] ---------------------------------------------------------------- [03/20/2023-21:23:23] [I] [TRT] Input filename: 4x_fatal_Anime_500000_G.onnx [03/20/2023-21:23:23] [I] [TRT] ONNX IR version: 0.0.7 [03/20/2023-21:23:23] [I] [TRT] Opset version: 14 [03/20/2023-21:23:23] [I] [TRT] Producer name: pytorch [03/20/2023-21:23:23] [I] [TRT] Producer version: 1.10 [03/20/2023-21:23:23] [I] [TRT] Domain: [03/20/2023-21:23:23] [I] [TRT] Model version: 0 [03/20/2023-21:23:23] [I] [TRT] Doc string: [03/20/2023-21:23:23] [I] [TRT] ---------------------------------------------------------------- [03/20/2023-21:23:24] [I] Finish parsing network model [03/20/2023-21:23:25] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +10, GPU +10, now: CPU 739, GPU 1205 (MiB) [03/20/2023-21:23:25] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.