zhiqwang / yolort

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.
https://zhiqwang.com/yolort
GNU General Public License v3.0
720 stars 153 forks source link

Slower than expected GPU inference in `deployment/libtorch` example #273

Closed mattpopovich closed 2 years ago

mattpopovich commented 2 years ago

🐛 Describe the bug

I created some yolo-rt-stack torchscript models by following the script here. I then followed the README instructions to build the LibTorch C++ code. Everything works as expected except inference on the GPU is much slower (7x) than the CPU.

Can you confirm these results or am I doing something wrong? I believe previously (July-August 2021 timeframe) I was seeing inference times in the 8-10ms range.

v4.0:

Click to show v4.0

```console root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v4.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names Set CPU mode Loading model Model loaded Run once on empty image [W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) Pre-process takes : 18 ms Inference takes : 106 ms Detected labels: 0 0 0 5 0 [ CPULongType{5} ] Detected boxes: 669.2656 391.3025 809.8663 885.2344 54.0635 397.8318 235.9531 901.3731 222.8834 406.8119 341.5572 854.7792 18.6320 232.9767 810.9739 760.1169 0.4640 502.0519 88.5140 887.0480 [ CPUFloatType{5,4} ] Detected scores: 0.8901 0.8733 0.8537 0.7234 0.3769 [ CPUFloatType{5} ] root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v4.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names --gpu Set GPU mode Loading model Model loaded Run once on empty image [W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) Pre-process takes : 21 ms Inference takes : 748 ms Detected labels: 0 0 0 5 0 [ CUDALongType{5} ] Detected boxes: 669.2656 391.3025 809.8663 885.2344 54.0635 397.8318 235.9531 901.3730 222.8834 406.8120 341.5572 854.7791 18.6320 232.9767 810.9739 760.1170 0.4640 502.0522 88.5139 887.0480 [ CUDAFloatType{5,4} ] Detected scores: 0.8901 0.8733 0.8537 0.7234 0.3769 [ CUDAFloatType{5} ] ```

v6.0:

Click to show v6.0

```console root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v6.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names Set CPU mode Loading model Model loaded Run once on empty image [W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) Pre-process takes : 15 ms Inference takes : 95 ms Detected labels: 0 0 0 5 0 [ CPULongType{5} ] Detected boxes: 224.5497 402.5811 342.7194 862.6057 51.8626 398.3438 245.3290 906.3114 679.8232 385.5574 809.3773 883.1394 0.1952 201.8805 812.9611 786.3345 0.0480 558.7347 75.8148 871.5754 [ CPUFloatType{5,4} ] Detected scores: 0.8959 0.8846 0.8579 0.5181 0.3932 [ CPUFloatType{5} ] root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v6.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names --gpu Set GPU mode Loading model Model loaded Run once on empty image [W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) Pre-process takes : 28 ms Inference takes : 746 ms Detected labels: 0 0 0 5 0 [ CUDALongType{5} ] Detected boxes: 224.5497 402.5810 342.7194 862.6058 51.8626 398.3439 245.3289 906.3113 679.8232 385.5574 809.3773 883.1393 0.1954 201.8804 812.9608 786.3347 0.0480 558.7346 75.8148 871.5754 [ CUDAFloatType{5,4} ] Detected scores: 0.8959 0.8846 0.8579 0.5181 0.3932 [ CUDAFloatType{5} ] ```

Thanks again for all your help thus far. I'm going to look into deployment/tensorrt next to see what inference times I can get there.

Versions

Click to display Versions

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.9.0a0+gitd69c22d Is debug build: False CUDA used to build PyTorch: 11.2 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8 (64-bit runtime) Python platform: Linux-5.4.0-92-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.2.152 GPU models and configuration: GPU 0: GeForce GTX 1080 GPU 1: GeForce GTX 1080 GPU 2: GeForce GTX 1080 Nvidia driver version: 460.91.03 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.8 [pip3] torch==1.9.0a0+gitd69c22d [pip3] torchmetrics==0.6.2 [pip3] torchvision==0.10.0a0+300a8a4 [conda] Could not collect ```

zhiqwang commented 2 years ago

Hi @mattpopovich ,

Seems that PyTorch 1.9 requires two warm-ups on the GPU, and we need to ignore the first two calculation times. Could you test it again or upgrade your PyTorch to 1.10.1? (Check https://github.com/pytorch/pytorch/pull/58801 for more details.)

The part of TensorRT C++ is under development, we have implemented the core parts of model converting, now there are several parts that still need to be implemented:

  1. We use the YOLO.load_from_yolov5() strategy in the TensorRT, so we should implement the pre-processing in the C++ example, and now the existing version is a bit rough.
  2. we use the static shape mechanism in the part of the model conversion to TensorRT engine, we need to add dynamic shape support, this is very important for practical applications. Check #266.

And all contributions are welcome here!

mattpopovich commented 2 years ago

Great find! I ran way too many tests on my machine (below) with PyTorch, TorchVision, and OpenCV built from source (originally I was seeing slow inference no matter how many times I "warmed up" the model, but I have since been unable to recreate that).

It looks like 3 warm-ups is necessary for all recent versions of PyTorch:


CUDA 11.4.3, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:

Click to show software configuration

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git302ee7b Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.152 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.0 [pip3] pytorch-lightning==1.5.8 [pip3] torch==1.10.0a0+git302ee7b [pip3] torchmetrics==0.6.2 [pip3] torchvision==0.11.0a0+e7ec7e2 [conda] Could not collect ```

CUDA 11.4.2, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:

Click to show software configuration

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git302ee7b Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.120 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.8 [pip3] torch==1.10.0a0+git302ee7b [pip3] torchmetrics==0.6.2 [pip3] torchvision==0.11.0a0+e7ec7e2 [conda] Could not collect ```

CUDA 11.4.2, PyTorch 1.10.0, TorchVision 0.11.1, OpenCV 4.5.4

Click to show software configuration

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git36449ea Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.120 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.4 [pip3] torch==1.10.0a0+git36449ea [pip3] torchmetrics==0.6.0 [pip3] torchvision==0.11.0a0+fa347eb [conda] Could not collect ```

CUDA 11.4.1, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:

Click to show software configuration

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git302ee7b Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.120 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.8 [pip3] torch==1.10.0a0+git302ee7b [pip3] torchmetrics==0.6.2 [pip3] torchvision==0.11.0a0+e7ec7e2 [conda] Could not collect ```

CUDA 11.4.1, PyTorch 1.10.0 commit 3fd9dcf, TorchVision 0.11.1, OpenCV 4.5.4:

Click to show software configuration

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git3fd9dcf Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.120 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.4 [pip3] torch==1.10.0a0+git3fd9dcf [pip3] torchmetrics==0.6.1 [pip3] torchvision==0.11.0a0+fa347eb [conda] Could not collect ```

CUDA 11.4.0, PyTorch 1.10.0, TorchVision 0.11.1, OpenCV 4.5.4:

Click to show software configuration

```console # python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git36449ea Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.120 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.4 [pip3] torch==1.10.0a0+git36449ea [pip3] torchmetrics==0.6.0 [pip3] torchvision==0.11.0a0+fa347eb [conda] Could not collect ```

CUDA 11.3.1, PyTorch 1.9.1, TorchVision 0.10.1, OpenCV 4.5.4

Click to show software configuration

```console python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.9.0a0+gitdfbd030 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8 (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.3.109 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.4 [pip3] torch==1.9.0a0+gitdfbd030 [pip3] torchmetrics==0.6.0 [pip3] torchvision==0.10.0a0+ca1a620 [conda] Could not collect ```

CUDA 11.2.0, PyTorch 1.9.0, TorchVision 1.10.0, OpenCV 4.5.2

Click to show software configuration

```console python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.9.0a0+gitd69c22d Is debug build: False CUDA used to build PyTorch: 11.2 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.1 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.1 Libc version: glibc-2.31 Python version: 3.8 (64-bit runtime) Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.2.67 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 470.74 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.4 [pip3] pytorch-lightning==1.5.2 [pip3] torch==1.9.0a0+gitd69c22d [pip3] torchmetrics==0.6.0 [pip3] torchvision==0.10.0a0+300a8a4 [conda] Could not collect ```


Different PC with pre-built everything, not running in docker:

CUDA 11.5, PyTorch 1.10.0, TorchVision 0.11.0, OpenCV 4.5.3

Click to show software configuration

```console $ python3 -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.10.0a0+git36449ea Is debug build: False CUDA used to build PyTorch: 11.5 ROCM used to build PyTorch: N/A OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0 Clang version: Could not collect CMake version: version 3.21.3 Libc version: glibc-2.25 Python version: 3.6.9 (default, Dec 8 2021, 21:08:43) [GCC 8.4.0] (64-bit runtime) Python platform: Linux-5.4.0-1065-azure-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: True CUDA runtime version: 11.5.119 GPU models and configuration: GPU 0: Tesla V100-PCIE-16GB Nvidia driver version: 495.29.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.1 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.1 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.1 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.1 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.1 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.1 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.1 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.10.0a0+git36449ea [pip3] torchvision==0.11.0a0+cdacbe0 [conda] Could not collect ```

zhiqwang commented 2 years ago

Hi @mattpopovich , Thanks for the detailed experimental data provided here! I believe this phenomenon can be explained, and as such I'm closing this ticket but let us know if you have further questions.