Open ofirzaf opened 1 month ago
I'm running into the same issue with Intel Ultra 165U. Now, running the same prompts that DO work on an older Intel with an Irix GPU I get the same error on the smaller prompts even. So it seems this may be something with the Intel GPU drivers related to memory perhaps? The "shared memory" showing in Windows task manager doesn't seem to actually represent what's going on behind the scenes.
I'm using this in C# using the Microsoft.ML.OnnxRuntimeGenAI.DirectML package, and not sure if there's a way to force to CPU so I can see if that works (I would expect so?).
@jorisdg, you can download the phi3-mini cpu model and try it out on CPU: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4
Thanks, I tried both the phi3 mini and mistral CPU versions and they work just fine.
I am encountering similar issues when I use long prompt on both the Phi-3-mini-4k-instruct-onnx
model (see issue https://github.com/microsoft/onnxruntime-genai/issues/549 for more details) and the Phi-3-mini-128k-instruct-onnx
model (see issue https://github.com/microsoft/onnxruntime-genai/issues/556 for more details) when inferencing on DirectML with the example phi3-qa.py
script.
Package Version: onnxruntime-genai-directml 0.3.0rc2
GPU: RTX 3090
why I can not run directml model directml\directml-int4-awq-block-128
on intel igpu?
Hardware: intel Core Ultra 9 185H with Arc GPU System: Windows 11 23H2
My laptop dont have NVIDIA GPU, but the onnxruntime_genai still require CUDA libraries
Traceback (most recent call last):
File "C:\Users\rocke\github\onnxruntime\phi3-qa.py", line 1, in <module>
import onnxruntime_genai as og
File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\__init__.py", line 13, in <module>
_dll_directory.add_dll_directory()
File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\_dll_directory.py", line 21, in add_dll_directory
raise ImportError("Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.")
ImportError: Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.
why I can not run directml model
directml\directml-int4-awq-block-128
on intel igpu?Hardware: intel Core Ultra 9 185H with Arc GPU System: Windows 11 23H2
My laptop dont have NVIDIA GPU, but the onnxruntime_genai still require CUDA libraries
Traceback (most recent call last): File "C:\Users\rocke\github\onnxruntime\phi3-qa.py", line 1, in <module> import onnxruntime_genai as og File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\__init__.py", line 13, in <module> _dll_directory.add_dll_directory() File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\_dll_directory.py", line 21, in add_dll_directory raise ImportError("Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.") ImportError: Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.
Try to install previous version 0.2.0 or compile from source and it will solve this issue, however you might get the issue I'm facing.
We can repro those 2 issues(long prompt and DLL loading issue) and are working on a fix.
Thanks, Yufeng
From: Ofir Zafrir @.> Sent: Saturday, June 8, 2024 1:16:05 PM To: microsoft/onnxruntime-genai @.> Cc: Comment @.***> Subject: Re: [microsoft/onnxruntime-genai] Phi-3-Mini fails to execute on long prompts on Intel integrated GPU with DirectML (Issue #570)
why I can not run directml model directml\directml-int4-awq-block-128 on intel igpu?
Hardware: intel Core Ultra 9 185H with Arc GPU System: Windows 11 23H2
My laptop dont have NVIDIA GPU, but the onnxruntime_genai still require CUDA libraries
Traceback (most recent call last):
File "C:\Users\rocke\github\onnxruntime\phi3-qa.py", line 1, in
Try to install previous version 0.2.0 or compile from source and it will solve this issue, however you might get the issue I'm facing.
— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/onnxruntime-genai/issues/570#issuecomment-2156169286 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHITBNRJN3KOIPIIOFJU6B3ZGNRALBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVI3TANBUGA2TMMZVGOSG4YLNMWUWQYLTL5WGCYTFNSWHG5LCNJSWG5C7OR4XAZNMJFZXG5LFINXW23LFNZ2KM5DPOBUWG44TQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJXGE4DGMJUG43DDAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMZTGU3DCMZVGEYYFJDUPFYGLJLMMFRGK3FFOZQWY5LFVI3TANBUGA2TMMZVGOTXI4TJM5TWK4VGMNZGKYLUMU. You are receiving this email because you commented on the thread.
Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
This long prompt issue should be resolved with the 0.3.0 release. I'll close this issue now. But please let us know if you still see the issue and we will re-open the investigation.
For the DLL loading issue, please see https://github.com/microsoft/onnxruntime-genai/issues/555.
I tried the 3.0 release, the error message has changed:
The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.
Could you please share the complete error and steps to reproduce?
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommittedResourceAllocator.cpp(22)\onnxruntime.dll!00007FFCED3321E1: (caller: 00007FFCED31423C) Exception(1) tid(6008) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action. '
NuGets: Microsoft.ML.OnnxRuntime.DirectML : 1.18.0 Microsoft.ML.OnnxRuntimeGenAI.DirectML : 0.3.0
I'm using Phi-3-mini-4k-instruct-onnx\directml\directml-int4-awq-block-128 on an Intel Ultra 165U GPU. When I use this small prompt, it works fine: "give a brief overview of the heliocentric view of the solar system" When I use this slightly larger prompt, I get the error: "I'm trying to create a longer prompt to do some testing. How long is this prompt exactly, can you count tokens?"
Same issue here:
OrtException: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommittedResourceAllocator.cpp(22)\onnxruntime.dll!00007FFB00EA21E1: (caller: 00007FFB00E8423C) Exception(1) tid(2594) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.
When I try to run short prompts (up to ~200 tokens) everything works well, however, if I increase the number of tokens in the input I get the following error:
I am running on Intel Core Ultra 155H with latest GPU driver available.