spacewalk01 / depth-anything-tensorrt

TensorRT implementation of Depth-Anything V1, V2
https://depth-anything.github.io/
MIT License
270 stars 32 forks source link

Fail to create engine #6

Closed ybwowen closed 8 months ago

ybwowen commented 8 months ago

I have exported the onnx file by export_to_onnx.py. However, I encountered errors by executing: trtexec --onnx=depth_anything_vits14.onnx --saveEngine=depth_anything_vits14.engine

Here is the log: &&&& RUNNING TensorRT.trtexec [TensorRT v8003] # trtexec --onnx=depth_anything_vits14.onnx --saveEngine=depth_anything_vits14.engine [01/29/2024-18:01:10] [I] === Model Options === [01/29/2024-18:01:10] [I] Format: ONNX [01/29/2024-18:01:10] [I] Model: depth_anything_vits14.onnx [01/29/2024-18:01:10] [I] Output: [01/29/2024-18:01:10] [I] === Build Options === [01/29/2024-18:01:10] [I] Max batch: explicit [01/29/2024-18:01:10] [I] Workspace: 16 MiB [01/29/2024-18:01:10] [I] minTiming: 1 [01/29/2024-18:01:10] [I] avgTiming: 8 [01/29/2024-18:01:10] [I] Precision: FP32 [01/29/2024-18:01:10] [I] Calibration: [01/29/2024-18:01:10] [I] Refit: Disabled [01/29/2024-18:01:10] [I] Sparsity: Disabled [01/29/2024-18:01:10] [I] Safe mode: Disabled [01/29/2024-18:01:10] [I] Restricted mode: Disabled [01/29/2024-18:01:10] [I] Save engine: depth_anything_vits14.engine [01/29/2024-18:01:10] [I] Load engine: [01/29/2024-18:01:10] [I] NVTX verbosity: 0 [01/29/2024-18:01:10] [I] Tactic sources: Using default tactic sources [01/29/2024-18:01:10] [I] timingCacheMode: local [01/29/2024-18:01:10] [I] timingCacheFile: [01/29/2024-18:01:10] [I] Input(s)s format: fp32:CHW [01/29/2024-18:01:10] [I] Output(s)s format: fp32:CHW [01/29/2024-18:01:10] [I] Input build shapes: model [01/29/2024-18:01:10] [I] Input calibration shapes: model [01/29/2024-18:01:10] [I] === System Options === [01/29/2024-18:01:10] [I] Device: 0 [01/29/2024-18:01:10] [I] DLACore: [01/29/2024-18:01:10] [I] Plugins: [01/29/2024-18:01:10] [I] === Inference Options === [01/29/2024-18:01:10] [I] Batch: Explicit [01/29/2024-18:01:10] [I] Input inference shapes: model [01/29/2024-18:01:10] [I] Iterations: 10 [01/29/2024-18:01:10] [I] Duration: 3s (+ 200ms warm up) [01/29/2024-18:01:10] [I] Sleep time: 0ms [01/29/2024-18:01:10] [I] Streams: 1 [01/29/2024-18:01:10] [I] ExposeDMA: Disabled [01/29/2024-18:01:10] [I] Data transfers: Enabled [01/29/2024-18:01:10] [I] Spin-wait: Disabled [01/29/2024-18:01:10] [I] Multithreading: Disabled [01/29/2024-18:01:10] [I] CUDA Graph: Disabled [01/29/2024-18:01:10] [I] Separate profiling: Disabled [01/29/2024-18:01:10] [I] Time Deserialize: Disabled [01/29/2024-18:01:10] [I] Time Refit: Disabled [01/29/2024-18:01:10] [I] Skip inference: Disabled [01/29/2024-18:01:10] [I] Inputs: [01/29/2024-18:01:10] [I] === Reporting Options === [01/29/2024-18:01:10] [I] Verbose: Disabled [01/29/2024-18:01:10] [I] Averages: 10 inferences [01/29/2024-18:01:10] [I] Percentile: 99 [01/29/2024-18:01:10] [I] Dump refittable layers:Disabled [01/29/2024-18:01:10] [I] Dump output: Disabled [01/29/2024-18:01:10] [I] Profile: Disabled [01/29/2024-18:01:10] [I] Export timing to JSON file: [01/29/2024-18:01:10] [I] Export output to JSON file: [01/29/2024-18:01:10] [I] Export profile to JSON file: [01/29/2024-18:01:10] [I] [01/29/2024-18:01:10] [I] === Device Information === [01/29/2024-18:01:10] [I] Selected Device: NVIDIA GeForce RTX 4060 Laptop GPU [01/29/2024-18:01:10] [I] Compute Capability: 8.9 [01/29/2024-18:01:10] [I] SMs: 24 [01/29/2024-18:01:10] [I] Compute Clock Rate: 2.25 GHz [01/29/2024-18:01:10] [I] Device Global Memory: 7931 MiB [01/29/2024-18:01:10] [I] Shared Memory per SM: 100 KiB [01/29/2024-18:01:10] [I] Memory Bus Width: 128 bits (ECC disabled) [01/29/2024-18:01:10] [I] Memory Clock Rate: 8.001 GHz [01/29/2024-18:01:10] [I] [01/29/2024-18:01:10] [I] TensorRT version: 8003 [01/29/2024-18:01:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +837, GPU +0, now: CPU 844, GPU 624 (MiB) [01/29/2024-18:01:10] [I] Start parsing network model [01/29/2024-18:01:10] [I] [TRT] ---------------------------------------------------------------- [01/29/2024-18:01:10] [I] [TRT] Input filename: depth_anything_vits14.onnx [01/29/2024-18:01:10] [I] [TRT] ONNX IR version: 0.0.6 [01/29/2024-18:01:10] [I] [TRT] Opset version: 11 [01/29/2024-18:01:10] [I] [TRT] Producer name: pytorch [01/29/2024-18:01:10] [I] [TRT] Producer version: 1.12.1 [01/29/2024-18:01:10] [I] [TRT] Domain:
[01/29/2024-18:01:10] [I] [TRT] Model version: 0 [01/29/2024-18:01:10] [I] [TRT] Doc string:
[01/29/2024-18:01:10] [I] [TRT] ---------------------------------------------------------------- [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:11] [W] [TRT] Output type must be INT32 for shape outputs [01/29/2024-18:01:11] [I] Finish parsing network model [01/29/2024-18:01:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 944, GPU 624 (MiB) [01/29/2024-18:01:11] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 944 MiB, GPU 624 MiB [01/29/2024-18:01:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1140, GPU +278, now: CPU 2085, GPU 902 (MiB) [01/29/2024-18:01:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +973, GPU +194, now: CPU 3058, GPU 1096 (MiB) [01/29/2024-18:01:12] [W] [TRT] Detected invalid timing cache, setup a local cache instead [01/29/2024-18:01:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 4351, GPU 1476 (MiB) [01/29/2024-18:01:13] [E] Error[1]: [caskUtils.cpp::trtSmToCask::114] Error Code 1: Internal Error (Unsupported SM: 0x809) [01/29/2024-18:01:13] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

spacewalk01 commented 8 months ago

Unsupported sm error occurs due to tensorrt/cuda version that is not compitable with your gpu device.

ybwowen commented 8 months ago

Thanks for reminding, I will check my cuda and tensorrt version then. What about the INT64 warnings? Do they matter or not?

spacewalk01 commented 8 months ago

It is okay. You can ignore warnings

ybwowen commented 8 months ago

Unsupported sm error occurs due to tensorrt/cuda version that is not compitable with your gpu device.

After reinstall cuda, cudnn and tensorrt, the problem is fixed. (Note: the cuda version for 4060 should be greater than 11.8)