Closed aarchangel64 closed 9 months ago
Hello, thank you for the project!
I am attempting to convert an onnx model to an engine using the command listed in the README:
trtexec --fp16 --onnx=models/rife414_lite_ensembleTrue_op18_fp16_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805
However it produces the following error / output:
&&&& RUNNING TensorRT.trtexec [TensorRT v9300] # trtexec --fp16 --onnx=models/rife414_lite_ensembleTrue_op18_fp16_clamp.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --preview=+fasterDynamicShapes0805 [02/25/2024-22:28:18] [I] === Model Options === [02/25/2024-22:28:18] [I] Format: ONNX [02/25/2024-22:28:18] [I] Model: models/rife414_lite_ensembleTrue_op18_fp16_clamp.onnx [02/25/2024-22:28:18] [I] Output: [02/25/2024-22:28:18] [I] === Build Options === [02/25/2024-22:28:18] [I] Max batch: explicit batch [02/25/2024-22:28:18] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [02/25/2024-22:28:18] [I] minTiming: 1 [02/25/2024-22:28:18] [I] avgTiming: 8 [02/25/2024-22:28:18] [I] Precision: FP32+FP16 [02/25/2024-22:28:18] [I] LayerPrecisions: [02/25/2024-22:28:18] [I] Layer Device Types: [02/25/2024-22:28:18] [I] Calibration: [02/25/2024-22:28:18] [I] Refit: Disabled [02/25/2024-22:28:18] [I] Weightless: Disabled [02/25/2024-22:28:18] [I] Version Compatible: Disabled [02/25/2024-22:28:18] [I] ONNX Native InstanceNorm: Disabled [02/25/2024-22:28:18] [I] TensorRT runtime: full [02/25/2024-22:28:18] [I] Lean DLL Path: [02/25/2024-22:28:18] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [02/25/2024-22:28:18] [I] Exclude Lean Runtime: Disabled [02/25/2024-22:28:18] [I] Sparsity: Disabled [02/25/2024-22:28:18] [I] Safe mode: Disabled [02/25/2024-22:28:18] [I] Build DLA standalone loadable: Disabled [02/25/2024-22:28:18] [I] Allow GPU fallback for DLA: Disabled [02/25/2024-22:28:18] [I] DirectIO mode: Disabled [02/25/2024-22:28:18] [I] Restricted mode: Disabled [02/25/2024-22:28:18] [I] Skip inference: Enabled [02/25/2024-22:28:18] [I] Save engine: model.engine [02/25/2024-22:28:18] [I] Load engine: [02/25/2024-22:28:18] [I] Profiling verbosity: 0 [02/25/2024-22:28:18] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [ON], [02/25/2024-22:28:18] [I] timingCacheMode: local [02/25/2024-22:28:18] [I] timingCacheFile: [02/25/2024-22:28:18] [I] Enable Compilation Cache: Enabled [02/25/2024-22:28:18] [I] errorOnTimingCacheMiss: Disabled [02/25/2024-22:28:18] [I] Heuristic: Disabled [02/25/2024-22:28:18] [I] Preview Features: kFASTER_DYNAMIC_SHAPES_0805 [ON], [02/25/2024-22:28:18] [I] MaxAuxStreams: -1 [02/25/2024-22:28:18] [I] BuilderOptimizationLevel: -1 [02/25/2024-22:28:18] [I] Calibration Profile Index: 0 [02/25/2024-22:28:18] [I] Input(s)s format: fp32:CHW [02/25/2024-22:28:18] [I] Output(s)s format: fp32:CHW [02/25/2024-22:28:18] [I] Input build shape (profile 0): input=1x8x64x64+1x8x720x1280+1x8x1080x1920 [02/25/2024-22:28:18] [I] Input calibration shapes: model [02/25/2024-22:28:18] [I] === System Options === [02/25/2024-22:28:18] [I] Device: 0 [02/25/2024-22:28:18] [I] DLACore: [02/25/2024-22:28:18] [I] Plugins: [02/25/2024-22:28:18] [I] setPluginsToSerialize: [02/25/2024-22:28:18] [I] dynamicPlugins: [02/25/2024-22:28:18] [I] ignoreParsedPluginLibs: 0 [02/25/2024-22:28:18] [I] [02/25/2024-22:28:18] [I] === Inference Options === [02/25/2024-22:28:18] [I] Batch: Explicit [02/25/2024-22:28:18] [I] Input inference shape : input=1x8x720x1280 [02/25/2024-22:28:18] [I] Iterations: 10 [02/25/2024-22:28:18] [I] Duration: 3s (+ 200ms warm up) [02/25/2024-22:28:18] [I] Sleep time: 0ms [02/25/2024-22:28:18] [I] Idle time: 0ms [02/25/2024-22:28:18] [I] Inference Streams: 1 [02/25/2024-22:28:18] [I] ExposeDMA: Disabled [02/25/2024-22:28:18] [I] Data transfers: Enabled [02/25/2024-22:28:18] [I] Spin-wait: Disabled [02/25/2024-22:28:18] [I] Multithreading: Disabled [02/25/2024-22:28:18] [I] CUDA Graph: Disabled [02/25/2024-22:28:18] [I] Separate profiling: Disabled [02/25/2024-22:28:18] [I] Time Deserialize: Disabled [02/25/2024-22:28:18] [I] Time Refit: Disabled [02/25/2024-22:28:18] [I] NVTX verbosity: 0 [02/25/2024-22:28:18] [I] Persistent Cache Ratio: 0 [02/25/2024-22:28:18] [I] Optimization Profile Index: 0 [02/25/2024-22:28:18] [I] Inputs: [02/25/2024-22:28:18] [I] === Reporting Options === [02/25/2024-22:28:18] [I] Verbose: Disabled [02/25/2024-22:28:18] [I] Averages: 10 inferences [02/25/2024-22:28:18] [I] Percentiles: 90,95,99 [02/25/2024-22:28:18] [I] Dump refittable layers:Disabled [02/25/2024-22:28:18] [I] Dump output: Disabled [02/25/2024-22:28:18] [I] Profile: Disabled [02/25/2024-22:28:18] [I] Export timing to JSON file: [02/25/2024-22:28:18] [I] Export output to JSON file: [02/25/2024-22:28:18] [I] Export profile to JSON file: [02/25/2024-22:28:18] [I] [02/25/2024-22:28:18] [I] === Device Information === Cuda failure: unknown error
I believe that the GPU is detected in docker, since it shows up in nvidia-smi:
nvidia-smi
root@fbcfe47e10fc:/workspace/tensorrt# nvidia-smi Sun Feb 25 22:30:08 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3080 Off | 00000000:0A:00.0 On | N/A | | 30% 36C P8 35W / 320W | 1385MiB / 10240MiB | 28% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| +---------------------------------------------------------------------------------------+
I'm not sure how to debug this, any help would be appreciated - thank you!
Of course, I solve this issue right after posting it - sorry for the noise! I managed to fix it by removing and re-inserting the nvidia_uvm module in my host OS, as in this stackoverflow answer.
Hello, thank you for the project!
I am attempting to convert an onnx model to an engine using the command listed in the README:
However it produces the following error / output:
I believe that the GPU is detected in docker, since it shows up in
nvidia-smi
:I'm not sure how to debug this, any help would be appreciated - thank you!