TensorRT environment Internal Error

joihn commented 2 years ago

🐛 Describe the bug

Hi, thanks for the repo. I have an issue to export to tensoRT. Error:

[04/13/2022-10:56:25] [TRT] [W] Skipping tactic 0 due to Myelin error: cuBLAS error 1 querying major version.
[04/13/2022-10:56:25] [TRT] [E] 10: [optimizer.cpp::computeCosts::2033] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Reshape_261 + Transpose_262...Concat_526]}.)

minimalistic reproduce code: download yolov5n6.pt(checkpoint from YOLOV5 repo) from yolort.runtime.trt_helper import export_tensorrt_engine path = "/home/joihn/Downloads/yolov5n6.pt" export_tensorrt_engine(path) It might be due to my env (details below )

Versions

tensorRT version : 8.4.0.6

PyTorch version: 1.10.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA TITAN Xp COLLECTORS EDITION Nvidia driver version: 470.103.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] torch==1.10.0 [pip3] torchvision==0.11.1 [conda] numpy 1.21.4 pypi_0 pypi [conda] torch 1.10.0 pypi_0 pypi [conda] torchvision 0.11.1 pypi_0 pypi

zhiqwang commented 2 years ago

Hi @joihn , Did you try the export_model.py CLI tools with

python tools/export_model.py --checkpoint_path yolov5n6.pt --include engine

See https://github.com/zhiqwang/yolov5-rt-stack/tree/main/deployment/tensorrt#usage for more details.

joihn commented 2 years ago

Thank for the quick response, The error with the CLI is the same,

Here is the full output:

python tools/export_model.py --checkpoint_path /home/maxime/Downloads/yolov5n6.pt --include engine
Command Line Args: Namespace(checkpoint_path='/home/maxime/Downloads/yolov5n6.pt', include=['engine'], onnx_path=None, trt_path=None, skip_preprocess=False, score_thresh=0.25, nms_thresh=0.45, version='r6.0', image_size=[640, 640], size_divisible=32, batch_size=1, opset=11, simplify=False)
Loaded saved model from /home/maxime/Downloads/yolov5n6.pt
/home/maxime/anaconda3/envs/yolov5rt/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/maxime/Documents/code/yolov5-rt-stack/yolort/models/anchor_utils.py:46: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  anchors = torch.as_tensor(self.anchor_grids, dtype=torch.float32, device=device).to(dtype=dtype)
/home/maxime/Documents/code/yolov5-rt-stack/yolort/models/anchor_utils.py:47: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
/home/maxime/Documents/code/yolov5-rt-stack/yolort/relay/logits_decoder.py:45: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  strides = torch.as_tensor(self.strides, dtype=torch.float32, device=device).to(dtype=dtype)
/home/maxime/Documents/code/yolov5-rt-stack/yolort/models/box_head.py:337: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  for head_output, grid, shift, stride in zip(head_outputs, grids, shifts, strides):
PyTorch2ONNX graph created successfully
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'Shape tensor cast elision' routine failed with: None
Created NMS plugin 'EfficientNMS_TRT' with attributes: {'plugin_version': '1', 'background_class': -1, 'max_output_boxes': 100, 'score_threshold': 0.25, 'iou_threshold': 0.45, 'score_activation': False, 'box_coding': 0}
Saved ONNX model to /home/maxime/Downloads/yolov5n6.trt.onnx
[04/13/2022-11:11:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +196, GPU +0, now: CPU 369, GPU 3584 (MiB)
[04/13/2022-11:11:16] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 388 MiB, GPU 3584 MiB
[04/13/2022-11:11:16] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 396 MiB, GPU 3586 MiB
[04/13/2022-11:11:17] [TRT] [I] ----------------------------------------------------------------
[04/13/2022-11:11:17] [TRT] [I] Input filename:   /home/maxime/Downloads/yolov5n6.trt.onnx
[04/13/2022-11:11:17] [TRT] [I] ONNX IR version:  0.0.8
[04/13/2022-11:11:17] [TRT] [I] Opset version:    11
[04/13/2022-11:11:17] [TRT] [I] Producer name:    
[04/13/2022-11:11:17] [TRT] [I] Producer version: 
[04/13/2022-11:11:17] [TRT] [I] Domain:           
[04/13/2022-11:11:17] [TRT] [I] Model version:    0
[04/13/2022-11:11:17] [TRT] [I] Doc string:       
[04/13/2022-11:11:17] [TRT] [I] ----------------------------------------------------------------
[04/13/2022-11:11:17] [TRT] [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/13/2022-11:11:17] [TRT] [W] onnx2trt_utils.cpp:391: One or more weights outside the range of INT32 was clamped
[04/13/2022-11:11:17] [TRT] [W] onnx2trt_utils.cpp:391: One or more weights outside the range of INT32 was clamped
[04/13/2022-11:11:17] [TRT] [W] onnx2trt_utils.cpp:391: One or more weights outside the range of INT32 was clamped
[04/13/2022-11:11:17] [TRT] [W] onnx2trt_utils.cpp:391: One or more weights outside the range of INT32 was clamped
[04/13/2022-11:11:17] [TRT] [W] onnx2trt_utils.cpp:391: One or more weights outside the range of INT32 was clamped
[04/13/2022-11:11:17] [TRT] [I] No importer registered for op: EfficientNMS_TRT. Attempting to import as plugin.
[04/13/2022-11:11:17] [TRT] [I] Searching for plugin: EfficientNMS_TRT, plugin_version: 1, plugin_namespace: 
[04/13/2022-11:11:17] [TRT] [I] Successfully created plugin: EfficientNMS_TRT
Network Description
Input 'images' with shape (1, 3, 640, 640) and dtype DataType.FLOAT
Output 'num_detections' with shape (1, 1) and dtype DataType.INT32
Output 'detection_boxes' with shape (1, 100, 4) and dtype DataType.FLOAT
Output 'detection_scores' with shape (1, 100) and dtype DataType.FLOAT
Output 'detection_classes' with shape (1, 100) and dtype DataType.INT32
Building fp32 Engine in /home/maxime/Downloads/yolov5n6.engine
Using fp32 mode.
[04/13/2022-11:11:19] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 110.9.2
[04/13/2022-11:11:19] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +267, GPU +112, now: CPU 680, GPU 3698 (MiB)
[04/13/2022-11:11:19] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +112, GPU +46, now: CPU 792, GPU 3744 (MiB)
[04/13/2022-11:11:19] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/13/2022-11:11:42] [TRT] [W] Skipping tactic 0 due to Myelin error: cuBLAS error 1 querying major version.
[04/13/2022-11:11:42] [TRT] [E] 10: [optimizer.cpp::computeCosts::2033] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[Reshape_261 + Transpose_262...Concat_526]}.)
Traceback (most recent call last):
  File "/home/maxime/Documents/code/yolov5-rt-stack/tools/export_model.py", line 193, in <module>
    cli_main()
  File "/home/maxime/Documents/code/yolov5-rt-stack/tools/export_model.py", line 178, in cli_main
    exported_paths[1] = export_tensorrt(
  File "/home/maxime/Documents/code/yolov5-rt-stack/tools/export_model.py", line 128, in export_tensorrt
    export_tensorrt_engine(
  File "/home/maxime/Documents/code/yolov5-rt-stack/yolort/runtime/trt_helper.py", line 88, in export_tensorrt_engine
    engine_builder.create_engine(engine_path)
  File "/home/maxime/Documents/code/yolov5-rt-stack/yolort/runtime/trt_helper.py", line 201, in create_engine
    with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
AttributeError: __enter__

zhiqwang commented 2 years ago

Seems that this error is due to the TensorRT environment from the log, but the ONNX model is exported correctly, you can find it in the directory of yolov5n6.pt, could you try trtexec with:

trtexec --onnx=yolov5n6.trt.onnx --saveEngine=yolov5n6.engine --workspace=8192

zhiqwang commented 2 years ago

Hi @joihn , See https://github.com/NVIDIA/TensorRT/issues/1917#issuecomment-1098723807 for the similar error here

trtexec also support --saveEngine --loadEngine, did you also hit failure when using trtexec? If not you can check the trtexec source to see what's difference between yours and trtexec.

zhiqwang commented 2 years ago

Hi @joihn , I'm closing this ticket due to inactivity for ~a long time~ eight days, but feel free to reopen this or create another ticket if you have further questions.

joihn commented 2 years ago

Sorry for the unresponsiveness, the issue was indeed to to a bad environement, thanks for your help :)

zhiqwang / yolort

TensorRT environment Internal Error #385

🐛 Describe the bug

Versions