When attempting to build an engine, TensorRT throws errors and fails (trtexec log below). Nothing fixed it until I tried to downgrade TRT. From mpv-janai v2.0.2, I copied the entire vsmlrt-cuda folder into v3.0's vapoursynth64/plugins folder, overwriting the old one.
Then v3 finally generated the engine and I tested it as working, but trtexec still complained that because of INT64, the result might be less accurate. Is that true? Should I be worried about that?
Error log before I downgraded TRT (regular janai v3)
&&&& RUNNING TensorRT.trtexec [TensorRT v9200] # C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\..\vapoursynth64\plugins\vsmlrt-cuda\trtexec --fp16 --onnx=C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --skipInference --infStreams=4 --builderOptimizationLevel=4 --saveEngine=C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.engine --tacticSources=-CUDNN,-CUBLAS,-CUBLAS_LT
[05/19/2024-14:15:37] [I] === Model Options ===
[05/19/2024-14:15:37] [I] Format: ONNX
[05/19/2024-14:15:37] [I] Model: C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.onnx
[05/19/2024-14:15:37] [I] Output:
[05/19/2024-14:15:37] [I] === Build Options ===
[05/19/2024-14:15:37] [I] Max batch: explicit batch
[05/19/2024-14:15:37] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/19/2024-14:15:37] [I] minTiming: 1
[05/19/2024-14:15:37] [I] avgTiming: 8
[05/19/2024-14:15:37] [I] Precision: FP32+FP16
[05/19/2024-14:15:37] [I] LayerPrecisions:
[05/19/2024-14:15:37] [I] Layer Device Types:
[05/19/2024-14:15:37] [I] Calibration:
[05/19/2024-14:15:37] [I] Refit: Disabled
[05/19/2024-14:15:37] [I] Weightless: Disabled
[05/19/2024-14:15:37] [I] Version Compatible: Disabled
[05/19/2024-14:15:37] [I] ONNX Native InstanceNorm: Disabled
[05/19/2024-14:15:37] [I] TensorRT runtime: full
[05/19/2024-14:15:37] [I] Lean DLL Path:
[05/19/2024-14:15:37] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[05/19/2024-14:15:37] [I] Exclude Lean Runtime: Disabled
[05/19/2024-14:15:37] [I] Sparsity: Disabled
[05/19/2024-14:15:37] [I] Safe mode: Disabled
[05/19/2024-14:15:37] [I] Build DLA standalone loadable: Disabled
[05/19/2024-14:15:37] [I] Allow GPU fallback for DLA: Disabled
[05/19/2024-14:15:37] [I] DirectIO mode: Disabled
[05/19/2024-14:15:37] [I] Restricted mode: Disabled
[05/19/2024-14:15:37] [I] Skip inference: Enabled
[05/19/2024-14:15:37] [I] Save engine: C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.engine
[05/19/2024-14:15:37] [I] Load engine:
[05/19/2024-14:15:37] [I] Profiling verbosity: 0
[05/19/2024-14:15:37] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [OFF],
[05/19/2024-14:15:37] [I] timingCacheMode: local
[05/19/2024-14:15:37] [I] timingCacheFile:
[05/19/2024-14:15:37] [I] Enable Compilation Cache: Enabled
[05/19/2024-14:15:37] [I] errorOnTimingCacheMiss: Disabled
[05/19/2024-14:15:37] [I] Heuristic: Disabled
[05/19/2024-14:15:37] [I] Preview Features: Use default preview flags.
[05/19/2024-14:15:37] [I] MaxAuxStreams: -1
[05/19/2024-14:15:37] [I] BuilderOptimizationLevel: 4
[05/19/2024-14:15:37] [I] Calibration Profile Index: 0
[05/19/2024-14:15:37] [I] Input(s)s format: fp32:CHW
[05/19/2024-14:15:37] [I] Output(s)s format: fp32:CHW
[05/19/2024-14:15:37] [I] Input build shape (profile 0): input=1x3x8x8+1x3x1080x1920+1x3x1080x1920
[05/19/2024-14:15:37] [I] Input calibration shapes: model
[05/19/2024-14:15:37] [I] === System Options ===
[05/19/2024-14:15:37] [I] Device: 0
[05/19/2024-14:15:37] [I] DLACore:
[05/19/2024-14:15:37] [I] Plugins:
[05/19/2024-14:15:37] [I] setPluginsToSerialize:
[05/19/2024-14:15:37] [I] dynamicPlugins:
[05/19/2024-14:15:37] [I] ignoreParsedPluginLibs: 0
[05/19/2024-14:15:37] [I]
[05/19/2024-14:15:37] [I] === Inference Options ===
[05/19/2024-14:15:37] [I] Batch: Explicit
[05/19/2024-14:15:37] [I] Input inference shape : input=1x3x1080x1920
[05/19/2024-14:15:37] [I] Iterations: 10
[05/19/2024-14:15:37] [I] Duration: 3s (+ 200ms warm up)
[05/19/2024-14:15:37] [I] Sleep time: 0ms
[05/19/2024-14:15:42] [I] Idle time: 0ms
[05/19/2024-14:15:42] [I] Inference Streams: 4
[05/19/2024-14:15:42] [I] ExposeDMA: Disabled
[05/19/2024-14:15:42] [I] Data transfers: Enabled
[05/19/2024-14:15:42] [I] Spin-wait: Disabled
[05/19/2024-14:15:42] [I] Multithreading: Disabled
[05/19/2024-14:15:42] [I] CUDA Graph: Disabled
[05/19/2024-14:15:42] [I] Separate profiling: Disabled
[05/19/2024-14:15:42] [I] Time Deserialize: Disabled
[05/19/2024-14:15:42] [I] Time Refit: Disabled
[05/19/2024-14:15:42] [I] NVTX verbosity: 0
[05/19/2024-14:15:42] [I] Persistent Cache Ratio: 0
[05/19/2024-14:15:42] [I] Optimization Profile Index: 0
[05/19/2024-14:15:42] [I] Inputs:
[05/19/2024-14:15:42] [I] === Reporting Options ===
[05/19/2024-14:15:42] [I] Verbose: Disabled
[05/19/2024-14:15:42] [I] Averages: 10 inferences
[05/19/2024-14:15:42] [I] Percentiles: 90,95,99
[05/19/2024-14:15:42] [I] Dump refittable layers:Disabled
[05/19/2024-14:15:42] [I] Dump output: Disabled
[05/19/2024-14:15:42] [I] Profile: Disabled
[05/19/2024-14:15:42] [I] Export timing to JSON file:
[05/19/2024-14:15:42] [I] Export output to JSON file:
[05/19/2024-14:15:42] [I] Export profile to JSON file:
[05/19/2024-14:15:42] [I]
[05/19/2024-14:15:42] [I] === Device Information ===
[05/19/2024-14:15:42] [I] Available Devices:
[05/19/2024-14:15:42] [I] Device 0: "NVIDIA GeForce RTX 3080" UUID: GPU-44b0b0ec-4a4a-2291-c949-ad5f2d47ac82
[05/19/2024-14:15:42] [I] Selected Device: NVIDIA GeForce RTX 3080
[05/19/2024-14:15:42] [I] Selected Device ID: 0
[05/19/2024-14:15:42] [I] Selected Device UUID: GPU-44b0b0ec-4a4a-2291-c949-ad5f2d47ac82
[05/19/2024-14:15:42] [I] Compute Capability: 8.6
[05/19/2024-14:15:42] [I] SMs: 68
[05/19/2024-14:15:42] [I] Device Global Memory: 10239 MiB
[05/19/2024-14:15:42] [I] Shared Memory per SM: 100 KiB
[05/19/2024-14:15:42] [I] Memory Bus Width: 320 bits (ECC disabled)
[05/19/2024-14:15:42] [I] Application Compute Clock Rate: 1.8 GHz
[05/19/2024-14:15:42] [I] Application Memory Clock Rate: 9.501 GHz
[05/19/2024-14:15:42] [I]
[05/19/2024-14:15:42] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/19/2024-14:15:42] [I]
[05/19/2024-14:15:42] [I] TensorRT version: 9.2.0
[05/19/2024-14:15:42] [I] Loading standard plugins
[05/19/2024-14:15:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 8080, GPU 1166 (MiB)
[05/19/2024-14:15:50] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2726, GPU +312, now: CPU 11097, GPU 1478 (MiB)
[05/19/2024-14:15:50] [I] Start parsing network model.
[05/19/2024-14:15:50] [I] [TRT] ----------------------------------------------------------------
[05/19/2024-14:15:50] [I] [TRT] Input filename: C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.onnx
[05/19/2024-14:15:50] [I] [TRT] ONNX IR version: 0.0.7
[05/19/2024-14:15:50] [I] [TRT] Opset version: 14
[05/19/2024-14:15:50] [I] [TRT] Producer name: pytorch
[05/19/2024-14:15:50] [I] [TRT] Producer version: 2.1.2
[05/19/2024-14:15:50] [I] [TRT] Domain:
[05/19/2024-14:15:50] [I] [TRT] Model version: 0
[05/19/2024-14:15:50] [I] [TRT] Doc string:
[05/19/2024-14:15:50] [I] [TRT] ----------------------------------------------------------------
[05/19/2024-14:15:50] [I] Finished parsing network model. Parse time: 0.0284306
[05/19/2024-14:15:50] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x8x8 OPT=1x3x1080x1920 MAX=1x3x1080x1920
[05/19/2024-14:15:50] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/19/2024-14:15:53] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_lessZero = lt(/Conv_output_0', /PRelu_zero), name=/PRelu_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:15:53] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_lessZero = lt(/Conv_output_0', /PRelu_zero), name=/PRelu_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_1_lessZero = lt(/Conv_1_output_0', /PRelu_1_zero), name=/PRelu_1_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_1_lessZero = lt(/Conv_1_output_0', /PRelu_1_zero), name=/PRelu_1_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_2_lessZero = lt(/Conv_2_output_0', /PRelu_2_zero), name=/PRelu_2_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_2_lessZero = lt(/Conv_2_output_0', /PRelu_2_zero), name=/PRelu_2_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_3_lessZero = lt(/Conv_3_output_0', /PRelu_3_zero), name=/PRelu_3_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_3_lessZero = lt(/Conv_3_output_0', /PRelu_3_zero), name=/PRelu_3_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_4_lessZero = lt(/Conv_4_output_0', /PRelu_4_zero), name=/PRelu_4_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_4_lessZero = lt(/Conv_4_output_0', /PRelu_4_zero), name=/PRelu_4_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_5_lessZero = lt(/Conv_5_output_0', /PRelu_5_zero), name=/PRelu_5_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_5_lessZero = lt(/Conv_5_output_0', /PRelu_5_zero), name=/PRelu_5_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_6_lessZero = lt(/Conv_6_output_0', /PRelu_6_zero), name=/PRelu_6_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_6_lessZero = lt(/Conv_6_output_0', /PRelu_6_zero), name=/PRelu_6_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_7_lessZero = lt(/Conv_7_output_0', /PRelu_7_zero), name=/PRelu_7_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_7_lessZero = lt(/Conv_7_output_0', /PRelu_7_zero), name=/PRelu_7_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_8_lessZero = lt(/Conv_8_output_0', /PRelu_8_zero), name=/PRelu_8_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_8_lessZero = lt(/Conv_8_output_0', /PRelu_8_zero), name=/PRelu_8_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:10] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/19/2024-14:16:11] [I] [TRT] Total Host Persistent Memory: 56032
[05/19/2024-14:16:11] [I] [TRT] Total Device Persistent Memory: 0
[05/19/2024-14:16:11] [I] [TRT] Total Scratch Memory: 4608
[05/19/2024-14:16:11] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 40 steps to complete.
[05/19/2024-14:16:11] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.6522ms to assign 4 blocks to 40 nodes requiring 1111454208 bytes.
[05/19/2024-14:16:11] [I] [TRT] Total Activation Memory: 1111454208
[05/19/2024-14:16:11] [I] [TRT] Total Weights Memory: 670720
[05/19/2024-14:16:11] [I] [TRT] Engine generation completed in 21.705 seconds.
[05/19/2024-14:16:11] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
When attempting to build an engine, TensorRT throws errors and fails (trtexec log below). Nothing fixed it until I tried to downgrade TRT. From mpv-janai v2.0.2, I copied the entire vsmlrt-cuda folder into v3.0's vapoursynth64/plugins folder, overwriting the old one.
Then v3 finally generated the engine and I tested it as working, but trtexec still complained that because of INT64, the result might be less accurate. Is that true? Should I be worried about that?
Error log before I downgraded TRT (regular janai v3)