styler00dollar / VSGAN-tensorrt-docker

Using VapourSynth with super resolution and interpolation models and speeding them up with TensorRT.
BSD 3-Clause "New" or "Revised" License
286 stars 30 forks source link

error when convert rife onnx model to trt #57

Closed xiazhenyz closed 10 months ago

xiazhenyz commented 10 months ago

i want convert rife onnx model((rife46_ensembleFalse_op18_clamp.onnx)) to trt, but an error occurred while converting. the command is: ''' ./trtexec --fp16 --onnx=rife46_ensembleFalse_op18_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805 ''' the log is: ''' &&&& RUNNING TensorRT.trtexec [TensorRT v8601] # ./trtexec --fp16 --onnx=rife46_ensembleFalse_op18_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805 [01/22/2024-17:23:30] [I] === Model Options === [01/22/2024-17:23:30] [I] Format: ONNX [01/22/2024-17:23:30] [I] Model: rife46_ensembleFalse_op18_clamp.onnx [01/22/2024-17:23:30] [I] Output: [01/22/2024-17:23:30] [I] === Build Options === [01/22/2024-17:23:30] [I] Max batch: explicit batch [01/22/2024-17:23:30] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [01/22/2024-17:23:30] [I] minTiming: 1 [01/22/2024-17:23:30] [I] avgTiming: 8 [01/22/2024-17:23:30] [I] Precision: FP32+FP16 [01/22/2024-17:23:30] [I] LayerPrecisions: [01/22/2024-17:23:30] [I] Layer Device Types: [01/22/2024-17:23:30] [I] Calibration: [01/22/2024-17:23:30] [I] Refit: Disabled [01/22/2024-17:23:30] [I] Version Compatible: Disabled [01/22/2024-17:23:30] [I] TensorRT runtime: full [01/22/2024-17:23:30] [I] Lean DLL Path: [01/22/2024-17:23:30] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [01/22/2024-17:23:30] [I] Exclude Lean Runtime: Disabled [01/22/2024-17:23:30] [I] Sparsity: Disabled [01/22/2024-17:23:30] [I] Safe mode: Disabled [01/22/2024-17:23:30] [I] Build DLA standalone loadable: Disabled [01/22/2024-17:23:30] [I] Allow GPU fallback for DLA: Disabled [01/22/2024-17:23:30] [I] DirectIO mode: Disabled [01/22/2024-17:23:30] [I] Restricted mode: Disabled [01/22/2024-17:23:30] [I] Skip inference: Enabled [01/22/2024-17:23:30] [I] Save engine: model.engine [01/22/2024-17:23:30] [I] Load engine: [01/22/2024-17:23:30] [I] Profiling verbosity: 0 [01/22/2024-17:23:30] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [ON], [01/22/2024-17:23:30] [I] timingCacheMode: local [01/22/2024-17:23:30] [I] timingCacheFile: [01/22/2024-17:23:30] [I] Heuristic: Disabled [01/22/2024-17:23:30] [I] Preview Features: kFASTER_DYNAMIC_SHAPES_0805 [ON], [01/22/2024-17:23:30] [I] MaxAuxStreams: -1 [01/22/2024-17:23:30] [I] BuilderOptimizationLevel: -1 [01/22/2024-17:23:30] [I] Input(s)s format: fp32:CHW [01/22/2024-17:23:30] [I] Output(s)s format: fp32:CHW [01/22/2024-17:23:30] [I] Input build shape: input=1x8x64x64+1x8x720x1280+1x8x1080x1920 [01/22/2024-17:23:30] [I] Input calibration shapes: model [01/22/2024-17:23:30] [I] === System Options === [01/22/2024-17:23:30] [I] Device: 0 [01/22/2024-17:23:30] [I] DLACore: [01/22/2024-17:23:30] [I] Plugins: [01/22/2024-17:23:30] [I] setPluginsToSerialize: [01/22/2024-17:23:30] [I] dynamicPlugins: [01/22/2024-17:23:30] [I] ignoreParsedPluginLibs: 0 [01/22/2024-17:23:30] [I] [01/22/2024-17:23:30] [I] === Inference Options === [01/22/2024-17:23:30] [I] Batch: Explicit [01/22/2024-17:23:30] [I] Input inference shape: input=1x8x720x1280 [01/22/2024-17:23:30] [I] Iterations: 10 [01/22/2024-17:23:30] [I] Duration: 3s (+ 200ms warm up) [01/22/2024-17:23:30] [I] Sleep time: 0ms [01/22/2024-17:23:30] [I] Idle time: 0ms [01/22/2024-17:23:30] [I] Inference Streams: 1 [01/22/2024-17:23:30] [I] ExposeDMA: Disabled [01/22/2024-17:23:30] [I] Data transfers: Enabled [01/22/2024-17:23:30] [I] Spin-wait: Disabled [01/22/2024-17:23:30] [I] Multithreading: Disabled [01/22/2024-17:23:30] [I] CUDA Graph: Disabled [01/22/2024-17:23:30] [I] Separate profiling: Disabled [01/22/2024-17:23:30] [I] Time Deserialize: Disabled [01/22/2024-17:23:30] [I] Time Refit: Disabled [01/22/2024-17:23:30] [I] NVTX verbosity: 0 [01/22/2024-17:23:30] [I] Persistent Cache Ratio: 0 [01/22/2024-17:23:30] [I] Inputs: [01/22/2024-17:23:30] [I] === Reporting Options === [01/22/2024-17:23:30] [I] Verbose: Disabled [01/22/2024-17:23:30] [I] Averages: 10 inferences [01/22/2024-17:23:30] [I] Percentiles: 90,95,99 [01/22/2024-17:23:30] [I] Dump refittable layers:Disabled [01/22/2024-17:23:30] [I] Dump output: Disabled [01/22/2024-17:23:30] [I] Profile: Disabled [01/22/2024-17:23:30] [I] Export timing to JSON file: [01/22/2024-17:23:30] [I] Export output to JSON file: [01/22/2024-17:23:30] [I] Export profile to JSON file: [01/22/2024-17:23:30] [I] [01/22/2024-17:23:30] [I] === Device Information === [01/22/2024-17:23:30] [I] Selected Device: Tesla V100-SXM2-32GB [01/22/2024-17:23:30] [I] Compute Capability: 7.0 [01/22/2024-17:23:30] [I] SMs: 80 [01/22/2024-17:23:30] [I] Device Global Memory: 32510 MiB [01/22/2024-17:23:30] [I] Shared Memory per SM: 96 KiB [01/22/2024-17:23:30] [I] Memory Bus Width: 4096 bits (ECC enabled) [01/22/2024-17:23:30] [I] Application Compute Clock Rate: 1.53 GHz [01/22/2024-17:23:30] [I] Application Memory Clock Rate: 0.877 GHz [01/22/2024-17:23:30] [I] [01/22/2024-17:23:30] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [01/22/2024-17:23:30] [I] [01/22/2024-17:23:30] [I] TensorRT version: 8.6.1 [01/22/2024-17:23:30] [I] Loading standard plugins [01/22/2024-17:23:31] [I] [TRT] [MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 277, GPU 2131 (MiB) [01/22/2024-17:23:31] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 277 MiB, GPU 2131 MiB [01/22/2024-17:23:31] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 389 MiB, GPU 2155 MiB [01/22/2024-17:23:31] [I] Start parsing network model. [01/22/2024-17:23:31] [I] [TRT] ---------------------------------------------------------------- [01/22/2024-17:23:31] [I] [TRT] Input filename: rife46_ensembleFalse_op18_clamp.onnx [01/22/2024-17:23:31] [I] [TRT] ONNX IR version: 0.0.8 [01/22/2024-17:23:31] [I] [TRT] Opset version: 18 [01/22/2024-17:23:31] [I] [TRT] Producer name: pytorch [01/22/2024-17:23:31] [I] [TRT] Producer version: 2.1.1 [01/22/2024-17:23:31] [I] [TRT] Domain:
[01/22/2024-17:23:31] [I] [TRT] Model version: 0 [01/22/2024-17:23:31] [I] [TRT] Doc string:
[01/22/2024-17:23:31] [I] [TRT] ---------------------------------------------------------------- [01/22/2024-17:23:31] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [01/22/2024-17:23:31] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped [01/22/2024-17:23:31] [E] Error[4]: [shuffleNode.cpp::symbolicExecute::387] Error Code 4: Internal Error (/Reshape: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2]) [01/22/2024-17:23:31] [E] [TRT] ModelImporter.cpp:773: While parsing node number 98 [Pad -> "/Pad_output_0"]: [01/22/2024-17:23:31] [E] [TRT] ModelImporter.cpp:774: --- Begin node --- [01/22/2024-17:23:31] [E] [TRT] ModelImporter.cpp:775: input: "/Slice_output_0" input: "/Cast_5_output_0" input: "" output: "/Pad_output_0" name: "/Pad" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING }

[01/22/2024-17:23:31] [E] [TRT] ModelImporter.cpp:776: --- End node --- [01/22/2024-17:23:31] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:179 In function parseGraph: [6] Invalid Node - /Pad [shuffleNode.cpp::symbolicExecute::387] Error Code 4: Internal Error (/Reshape: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2]) [01/22/2024-17:23:31] [E] Failed to parse onnx file [01/22/2024-17:23:31] [I] Finished parsing network model. Parse time: 0.0638087 [01/22/2024-17:23:31] [E] Parsing model failed [01/22/2024-17:23:31] [E] Failed to create engine from model or file. [01/22/2024-17:23:31] [E] Engine set up failed ''' and my environment is: cuda 11.3 tensorrt TensorRT 8.6 GA ''' and i tried trt v8.2 is the same error.

styler00dollar commented 10 months ago

I already explained version requirements here.

TRT 9.2

&&&& PASSED TensorRT.trtexec [TensorRT v9200] # trtexec --fp16 --onnx=rife46_ensembleFalse_op18_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=rife46_ensembleFalse_op18_clamp.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805

TRT 8.6

&&&& PASSED TensorRT.trtexec [TensorRT v8601] # trtexec --fp16 --onnx=rife46_ensembleFalse_op18_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=rife46_ensembleFalse_op18_clamp.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805

If you use my docker, it works. Get the lastest one if you haven't already. Looks like user error to me.