styler00dollar / VSGAN-tensorrt-docker

Using VapourSynth with super resolution and interpolation models and speeding them up with TensorRT.
BSD 3-Clause "New" or "Revised" License
250 stars 30 forks source link

Unable to convert ONNX model to engine: "Cuda failure: unknown error" #67

Closed aarchangel64 closed 4 months ago

aarchangel64 commented 4 months ago

Hello, thank you for the project!

I am attempting to convert an onnx model to an engine using the command listed in the README:

trtexec --fp16 --onnx=models/rife414_lite_ensembleTrue_op18_fp16_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805

However it produces the following error / output:

&&&& RUNNING TensorRT.trtexec [TensorRT v9300] # trtexec --fp16 --onnx=models/rife414_lite_ensembleTrue_op18_fp16_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805
[02/25/2024-22:28:18] [I] === Model Options ===
[02/25/2024-22:28:18] [I] Format: ONNX
[02/25/2024-22:28:18] [I] Model: models/rife414_lite_ensembleTrue_op18_fp16_clamp.onnx
[02/25/2024-22:28:18] [I] Output:
[02/25/2024-22:28:18] [I] === Build Options ===
[02/25/2024-22:28:18] [I] Max batch: explicit batch
[02/25/2024-22:28:18] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[02/25/2024-22:28:18] [I] minTiming: 1
[02/25/2024-22:28:18] [I] avgTiming: 8
[02/25/2024-22:28:18] [I] Precision: FP32+FP16
[02/25/2024-22:28:18] [I] LayerPrecisions:
[02/25/2024-22:28:18] [I] Layer Device Types:
[02/25/2024-22:28:18] [I] Calibration:
[02/25/2024-22:28:18] [I] Refit: Disabled
[02/25/2024-22:28:18] [I] Weightless: Disabled
[02/25/2024-22:28:18] [I] Version Compatible: Disabled
[02/25/2024-22:28:18] [I] ONNX Native InstanceNorm: Disabled
[02/25/2024-22:28:18] [I] TensorRT runtime: full
[02/25/2024-22:28:18] [I] Lean DLL Path:
[02/25/2024-22:28:18] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/25/2024-22:28:18] [I] Exclude Lean Runtime: Disabled
[02/25/2024-22:28:18] [I] Sparsity: Disabled
[02/25/2024-22:28:18] [I] Safe mode: Disabled
[02/25/2024-22:28:18] [I] Build DLA standalone loadable: Disabled
[02/25/2024-22:28:18] [I] Allow GPU fallback for DLA: Disabled
[02/25/2024-22:28:18] [I] DirectIO mode: Disabled
[02/25/2024-22:28:18] [I] Restricted mode: Disabled
[02/25/2024-22:28:18] [I] Skip inference: Enabled
[02/25/2024-22:28:18] [I] Save engine: model.engine
[02/25/2024-22:28:18] [I] Load engine:
[02/25/2024-22:28:18] [I] Profiling verbosity: 0
[02/25/2024-22:28:18] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [ON],
[02/25/2024-22:28:18] [I] timingCacheMode: local
[02/25/2024-22:28:18] [I] timingCacheFile:
[02/25/2024-22:28:18] [I] Enable Compilation Cache: Enabled
[02/25/2024-22:28:18] [I] errorOnTimingCacheMiss: Disabled
[02/25/2024-22:28:18] [I] Heuristic: Disabled
[02/25/2024-22:28:18] [I] Preview Features: kFASTER_DYNAMIC_SHAPES_0805 [ON],
[02/25/2024-22:28:18] [I] MaxAuxStreams: -1
[02/25/2024-22:28:18] [I] BuilderOptimizationLevel: -1
[02/25/2024-22:28:18] [I] Calibration Profile Index: 0
[02/25/2024-22:28:18] [I] Input(s)s format: fp32:CHW
[02/25/2024-22:28:18] [I] Output(s)s format: fp32:CHW
[02/25/2024-22:28:18] [I] Input build shape (profile 0): input=1x8x64x64+1x8x720x1280+1x8x1080x1920
[02/25/2024-22:28:18] [I] Input calibration shapes: model
[02/25/2024-22:28:18] [I] === System Options ===
[02/25/2024-22:28:18] [I] Device: 0
[02/25/2024-22:28:18] [I] DLACore:
[02/25/2024-22:28:18] [I] Plugins:
[02/25/2024-22:28:18] [I] setPluginsToSerialize:
[02/25/2024-22:28:18] [I] dynamicPlugins:
[02/25/2024-22:28:18] [I] ignoreParsedPluginLibs: 0
[02/25/2024-22:28:18] [I]
[02/25/2024-22:28:18] [I] === Inference Options ===
[02/25/2024-22:28:18] [I] Batch: Explicit
[02/25/2024-22:28:18] [I] Input inference shape : input=1x8x720x1280
[02/25/2024-22:28:18] [I] Iterations: 10
[02/25/2024-22:28:18] [I] Duration: 3s (+ 200ms warm up)
[02/25/2024-22:28:18] [I] Sleep time: 0ms
[02/25/2024-22:28:18] [I] Idle time: 0ms
[02/25/2024-22:28:18] [I] Inference Streams: 1
[02/25/2024-22:28:18] [I] ExposeDMA: Disabled
[02/25/2024-22:28:18] [I] Data transfers: Enabled
[02/25/2024-22:28:18] [I] Spin-wait: Disabled
[02/25/2024-22:28:18] [I] Multithreading: Disabled
[02/25/2024-22:28:18] [I] CUDA Graph: Disabled
[02/25/2024-22:28:18] [I] Separate profiling: Disabled
[02/25/2024-22:28:18] [I] Time Deserialize: Disabled
[02/25/2024-22:28:18] [I] Time Refit: Disabled
[02/25/2024-22:28:18] [I] NVTX verbosity: 0
[02/25/2024-22:28:18] [I] Persistent Cache Ratio: 0
[02/25/2024-22:28:18] [I] Optimization Profile Index: 0
[02/25/2024-22:28:18] [I] Inputs:
[02/25/2024-22:28:18] [I] === Reporting Options ===
[02/25/2024-22:28:18] [I] Verbose: Disabled
[02/25/2024-22:28:18] [I] Averages: 10 inferences
[02/25/2024-22:28:18] [I] Percentiles: 90,95,99
[02/25/2024-22:28:18] [I] Dump refittable layers:Disabled
[02/25/2024-22:28:18] [I] Dump output: Disabled
[02/25/2024-22:28:18] [I] Profile: Disabled
[02/25/2024-22:28:18] [I] Export timing to JSON file:
[02/25/2024-22:28:18] [I] Export output to JSON file:
[02/25/2024-22:28:18] [I] Export profile to JSON file:
[02/25/2024-22:28:18] [I]
[02/25/2024-22:28:18] [I] === Device Information ===
Cuda failure: unknown error

I believe that the GPU is detected in docker, since it shows up in nvidia-smi:

root@fbcfe47e10fc:/workspace/tensorrt# nvidia-smi
Sun Feb 25 22:30:08 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:0A:00.0  On |                  N/A |
| 30%   36C    P8              35W / 320W |   1385MiB / 10240MiB |     28%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

I'm not sure how to debug this, any help would be appreciated - thank you!

aarchangel64 commented 4 months ago

Of course, I solve this issue right after posting it - sorry for the noise! I managed to fix it by removing and re-inserting the nvidia_uvm module in my host OS, as in this stackoverflow answer.