microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
315 stars 71 forks source link

Phi-3 does not load on iGPU vega 11 (Ryzen 2400g) #403

Open CodeI000I opened 2 months ago

CodeI000I commented 2 months ago

I have the iGPU vega 11 in my ryzen 2400g, so I run phi3 with DirectML and when I run python phi3-qa.py -m directml\directml-int4-awq-block-128 from phi-3-tutorial.md, I got this error and no load on iGPU. By the way, I run python phi3-qa.py -m directml\directml-int4-awq-block-128 several times and in some cases there was an answer in a couple of words, but they appeared very slowly (the load on the processor did not exceed 15%, the built-in graphics almost did not load at all). Please, can someone explain me how does it works and why this is hapenning. PHI3

I'm using the phi3 model with onnx in the hope that it will run faster than the regular one that way (I used the regular version of phi3 via ollama, and I wasn't satisfied with the speed)

If we can get through this, can anyone tell me how I can utilize phi3 features in python code. In README.md there is an example of integrating phi2 into a python program, will everything work the same with phi3? Also in README.md there is a link ([https://onnxruntime.ai/docs/genai]) to the full onnx documentation, but in the tutorials section I also found only for phi2. ONNX

If anyone has free time, could you please describe step by step (what api I should use, where I can find ready-made examples of such integrations and etc ) how to properly implement phi3 with ONNX in python code (I searched for guides on youtube, but since the release of the model it's been too little time for quality tutorials to appear) sorry I'm a real beginner thanks for any advice

natke commented 2 months ago

Hi @CodeI000I, You did the right thing in running the phi3-qa.py tutorial. It sounds like you are hitting an issue. We will try and repro at our end, as well as improving the documentaion.

natke commented 1 month ago

Hi @CodeI000I, can you try upgrading to the latest version (0.2.0) of onnxruntime-genai-directml?

computHome commented 3 weeks ago

I tried it too using version (0.3.0rc2) of onnxruntime-genai-directml. No any load on GPU when doing inference. (using the method of Run on CPU is faster than run with DirectML)

Below is my CPU spec with integrated GPU: Processor AMD Ryzen 3 PRO 4350G with Radeon Graphics 3.80 GHz Installed RAM 32.0 GB (31.4 GB usable) System type 64-bit operating system, x64-based processor