microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
319 stars 71 forks source link

Querying the ids of all the DML devices and selecting a device for inference #360

Open jojo1899 opened 2 months ago

jojo1899 commented 2 months ago

I am working with onnxruntime-genai-directml 0.2.0rc3 and Mistral-7B for LLM inference. The code in model-qa.py works well for me in general. I tried the code on two different hardware:

  1. A laptop with an Intel iGPU and Nvidia Laptop GPU
  2. A desktop with an AMD iGPU and Nvidia Desktop GPU

The code works well on my laptop and it seems to use the Nvidia Laptop GPU for inference. However, on the desktop, the code tries using the AMD iGPU and crashes with an AMD driver timeout error. I disabled the iGPU on my desktop and ran the code again. It used the Nvidia Desktop GPU and worked well.

Apparently, the device with id 0 on my laptop is my Nvidia Laptop GPU whereas on my desktop it is the AMD iGPU (?) The Windows Task manager on my Laptop shows as follows: Laptop:

GPU 0
Intel UHD Graphics
Physical location: PCI bus 0, device 2, function 0

GPU 1
Nvidia GeForce RTX... Laptop GPU
Physical location: PCI bus 1, device 0, function 0

I do not understand how is the code running well and using Nvidia Laptop GPU on my laptop where GPU 0 is Intel UHD Graphics.

After disabling the AMD iGPU and enabling it again, the Windows Task manager on my Desktop shows as follows: Desktop:

GPU 0
Nvidia GeForce RTX... Desktop GPU
Physical location: PCI bus 1, device 0, function 0

GPU 1
AMD Raedon...
Physical location: PCI bus 22, device 0, function 0

The code now works well on my desktop and uses the Nvidia Desktop GPU. But it did not work by default. I suppose that disabling and reenabling the iGPU changed the device ids.

Is there a way to query the device ids in my program and use the device that I want to use without any ambiguity?

yufenglee commented 2 months ago

@PatriceVignola, could you please help take a look?

PatriceVignola commented 2 months ago

@jojo1899 We recently made changes that should select the most performant adapter by default. You can test it out if you build from source.

jojo1899 commented 2 months ago

Thanks, I will test it out today.

Does onnxruntime-genai also allow selecting a GPU via a device index (e.g., 0, 1, 2...)? Which device index represents which GPU is another discussion, but is there a way to specify an index and select the corresponding GPU? If it currently does not allow that, do you plan to introduce it in the near future?

yufenglee commented 2 months ago

@jojo1899, We have API to set device id C API:https://github.com/microsoft/onnxruntime-genai/blob/2fe903ee81e86613549e3d063e8985c775028c2f/src/ort_genai_c.h#L232-L233 Python API:https://github.com/microsoft/onnxruntime-genai/blob/2fe903ee81e86613549e3d063e8985c775028c2f/src/python/python.cpp#L430-L431

C#:https://github.com/microsoft/onnxruntime-genai/blob/2fe903ee81e86613549e3d063e8985c775028c2f/src/csharp/NativeMethods.cs#L175-L178

AshD commented 1 month ago

@yufenglee I tried calling Utils.GetCurrentGpuDeviceId() in C# using the Microsoft.ML.OnnxRuntimeGenAI.DirectML 0.2.0-rc6 nuget

and it threw a Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'CUDA and/or ROCM execution provider is either not enabled or not available.'

Same error on Utils.SetCurrentGpuDeviceId(1)

natke commented 1 month ago

@AshD I have added your information to this issue #488

jeremyfowers commented 1 month ago

@yufenglee I am trying to use the Python API you linked above, but I am not having any luck. I am trying things like

import onnxruntime_genai as og
current_device_id = og.get_current_gpu_device_id()

and

import onnxruntime_genai as og
og.set_current_gpu_device_id(1)

Both of these throw this error:

onnxruntime_genai.onnxruntime_genai.OrtException: D:\a\_work\1\s\include\onnxruntime\core/common/logging/logging.h:320 onnxruntime::logging::LoggingManager::DefaultLogger Attempt to use DefaultLogger but none has been registered.

Any advice?