microsoft / planetary-computer-containers

Container definitions for the Planetary Computer
MIT License
53 stars 12 forks source link

Getting onnxruntime to work with CUDAExecutionProvider on gpu-pytorch container #33

Closed weiji14 closed 2 years ago

weiji14 commented 2 years ago

Hi again, just trying to use onnxruntime to run a neural network as a follow up from The CPU execution works fine, but it seems that the GPU execution isn't working for some reason.

Steps to reproduce on the gpu-pytorch container.

pip install onnxruntime-gpu

then restart the kernel before running the below

import onnxruntime

# 1.11.0
# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

so it seems to know there there is a CUDA-capable GPU. But when I try to get an onnxruntime session going, it only picks up the CPU. Get a sample .onnx file, e.g. from

ort_session = onnxruntime.InferenceSession(
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
input_name = ort_session.get_inputs()[0].name

produces a warning:

2022-04-15 15:09:38.624858540 [W:onnxruntime:Default, CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference to ensure all dependencies are met.

Looking at the output of nvidia-smi though, the CUDA version is 11.0 which should be ok if I understand correctly:

| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000001:00:00.0 Off |                  Off |
| N/A   30C    P8    11W /  70W |      0MiB / 16127MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |

So I'm wondering if there's some other library that needs to be added to the container to make onnxruntime's GPU execution work. Maybe related to

Another thing I'd like to ask if there's room to get onnxruntime into the gpu-pytorch image? Happy to submit a pull request to add it in.

TomAugspurger commented 2 years ago

Can you try !mamba install -y -c conda-forge onnxruntime to see if that does the trick?

If that's successful I'll get it added to the gpu-pytorch image.

weiji14 commented 2 years ago

Can you try !mamba install -y -c conda-forge onnxruntime to see if that does the trick?

Nope, doesn't work. The conda-forge onnxruntime seems to be CPU only for now, need to wait for to be merged.

I did manage to get it to work by updating cudatoolkit from 10.2 to 11.6 like so:

!mamba update -y cudatoolkit
!pip install onnxruntime-gpu

i.e. this line in the lockfile needs to change:

Is the plan to stick with CUDA 10.2? Or can the next container update use a newer CUDA version >11?

TomAugspurger commented 2 years ago


We should be able to update to CUDA 11.x. I'll take a look at that this week.