Closed ybwowen closed 8 months ago
Unsupported sm error occurs due to tensorrt/cuda version that is not compitable with your gpu device.
Thanks for reminding, I will check my cuda and tensorrt version then. What about the INT64 warnings? Do they matter or not?
It is okay. You can ignore warnings
Unsupported sm error occurs due to tensorrt/cuda version that is not compitable with your gpu device.
After reinstall cuda, cudnn and tensorrt, the problem is fixed. (Note: the cuda version for 4060 should be greater than 11.8)
I have exported the onnx file by export_to_onnx.py. However, I encountered errors by executing:
trtexec --onnx=depth_anything_vits14.onnx --saveEngine=depth_anything_vits14.engine
Here is the log: &&&& RUNNING TensorRT.trtexec [TensorRT v8003] # trtexec --onnx=depth_anything_vits14.onnx --saveEngine=depth_anything_vits14.engine [01/29/2024-18:01:10] [I] === Model Options === [01/29/2024-18:01:10] [I] Format: ONNX [01/29/2024-18:01:10] [I] Model: depth_anything_vits14.onnx [01/29/2024-18:01:10] [I] Output: [01/29/2024-18:01:10] [I] === Build Options === [01/29/2024-18:01:10] [I] Max batch: explicit [01/29/2024-18:01:10] [I] Workspace: 16 MiB [01/29/2024-18:01:10] [I] minTiming: 1 [01/29/2024-18:01:10] [I] avgTiming: 8 [01/29/2024-18:01:10] [I] Precision: FP32 [01/29/2024-18:01:10] [I] Calibration: [01/29/2024-18:01:10] [I] Refit: Disabled [01/29/2024-18:01:10] [I] Sparsity: Disabled [01/29/2024-18:01:10] [I] Safe mode: Disabled [01/29/2024-18:01:10] [I] Restricted mode: Disabled [01/29/2024-18:01:10] [I] Save engine: depth_anything_vits14.engine [01/29/2024-18:01:10] [I] Load engine: [01/29/2024-18:01:10] [I] NVTX verbosity: 0 [01/29/2024-18:01:10] [I] Tactic sources: Using default tactic sources [01/29/2024-18:01:10] [I] timingCacheMode: local [01/29/2024-18:01:10] [I] timingCacheFile: [01/29/2024-18:01:10] [I] Input(s)s format: fp32:CHW [01/29/2024-18:01:10] [I] Output(s)s format: fp32:CHW [01/29/2024-18:01:10] [I] Input build shapes: model [01/29/2024-18:01:10] [I] Input calibration shapes: model [01/29/2024-18:01:10] [I] === System Options === [01/29/2024-18:01:10] [I] Device: 0 [01/29/2024-18:01:10] [I] DLACore: [01/29/2024-18:01:10] [I] Plugins: [01/29/2024-18:01:10] [I] === Inference Options === [01/29/2024-18:01:10] [I] Batch: Explicit [01/29/2024-18:01:10] [I] Input inference shapes: model [01/29/2024-18:01:10] [I] Iterations: 10 [01/29/2024-18:01:10] [I] Duration: 3s (+ 200ms warm up) [01/29/2024-18:01:10] [I] Sleep time: 0ms [01/29/2024-18:01:10] [I] Streams: 1 [01/29/2024-18:01:10] [I] ExposeDMA: Disabled [01/29/2024-18:01:10] [I] Data transfers: Enabled [01/29/2024-18:01:10] [I] Spin-wait: Disabled [01/29/2024-18:01:10] [I] Multithreading: Disabled [01/29/2024-18:01:10] [I] CUDA Graph: Disabled [01/29/2024-18:01:10] [I] Separate profiling: Disabled [01/29/2024-18:01:10] [I] Time Deserialize: Disabled [01/29/2024-18:01:10] [I] Time Refit: Disabled [01/29/2024-18:01:10] [I] Skip inference: Disabled [01/29/2024-18:01:10] [I] Inputs: [01/29/2024-18:01:10] [I] === Reporting Options === [01/29/2024-18:01:10] [I] Verbose: Disabled [01/29/2024-18:01:10] [I] Averages: 10 inferences [01/29/2024-18:01:10] [I] Percentile: 99 [01/29/2024-18:01:10] [I] Dump refittable layers:Disabled [01/29/2024-18:01:10] [I] Dump output: Disabled [01/29/2024-18:01:10] [I] Profile: Disabled [01/29/2024-18:01:10] [I] Export timing to JSON file: [01/29/2024-18:01:10] [I] Export output to JSON file: [01/29/2024-18:01:10] [I] Export profile to JSON file: [01/29/2024-18:01:10] [I] [01/29/2024-18:01:10] [I] === Device Information === [01/29/2024-18:01:10] [I] Selected Device: NVIDIA GeForce RTX 4060 Laptop GPU [01/29/2024-18:01:10] [I] Compute Capability: 8.9 [01/29/2024-18:01:10] [I] SMs: 24 [01/29/2024-18:01:10] [I] Compute Clock Rate: 2.25 GHz [01/29/2024-18:01:10] [I] Device Global Memory: 7931 MiB [01/29/2024-18:01:10] [I] Shared Memory per SM: 100 KiB [01/29/2024-18:01:10] [I] Memory Bus Width: 128 bits (ECC disabled) [01/29/2024-18:01:10] [I] Memory Clock Rate: 8.001 GHz [01/29/2024-18:01:10] [I] [01/29/2024-18:01:10] [I] TensorRT version: 8003 [01/29/2024-18:01:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +837, GPU +0, now: CPU 844, GPU 624 (MiB) [01/29/2024-18:01:10] [I] Start parsing network model [01/29/2024-18:01:10] [I] [TRT] ---------------------------------------------------------------- [01/29/2024-18:01:10] [I] [TRT] Input filename: depth_anything_vits14.onnx [01/29/2024-18:01:10] [I] [TRT] ONNX IR version: 0.0.6 [01/29/2024-18:01:10] [I] [TRT] Opset version: 11 [01/29/2024-18:01:10] [I] [TRT] Producer name: pytorch [01/29/2024-18:01:10] [I] [TRT] Producer version: 1.12.1 [01/29/2024-18:01:10] [I] [TRT] Domain:
[01/29/2024-18:01:10] [I] [TRT] Model version: 0 [01/29/2024-18:01:10] [I] [TRT] Doc string:
[01/29/2024-18:01:10] [I] [TRT] ---------------------------------------------------------------- [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:10] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped [01/29/2024-18:01:11] [W] [TRT] Output type must be INT32 for shape outputs [01/29/2024-18:01:11] [I] Finish parsing network model [01/29/2024-18:01:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 944, GPU 624 (MiB) [01/29/2024-18:01:11] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 944 MiB, GPU 624 MiB [01/29/2024-18:01:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1140, GPU +278, now: CPU 2085, GPU 902 (MiB) [01/29/2024-18:01:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +973, GPU +194, now: CPU 3058, GPU 1096 (MiB) [01/29/2024-18:01:12] [W] [TRT] Detected invalid timing cache, setup a local cache instead [01/29/2024-18:01:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 4351, GPU 1476 (MiB) [01/29/2024-18:01:13] [E] Error[1]: [caskUtils.cpp::trtSmToCask::114] Error Code 1: Internal Error (Unsupported SM: 0x809) [01/29/2024-18:01:13] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)