microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
333 stars 73 forks source link

Phi-3-Mini fails to execute on long prompts on Intel integrated GPU with DirectML #570

Open ofirzaf opened 1 month ago

ofirzaf commented 1 month ago

When I try to run short prompts (up to ~200 tokens) everything works well, however, if I increase the number of tokens in the input I get the following error:

Output: 2024-06-03 08:12:09.7100776 [E: onxruntime: onnxruntime-genai, sequential_executor.cc:516 onnxruntime::ExecuteKer nel] Non-zero status code returned while running DmlFusedNode_0_0 node. Name: 'DmlFusedNode_0_0' Status Message: D: \a\_wo rk \1\s \onnxruntime\core\providers\dml \DmlExecutionProvider\src\DmlGraphFusionHelper. cpp(1060)\onnxruntime.dll! 00007FFD50
С4АВ39: (caller: 00007FFD50CD96AE) Exception(2) tid(3218) 88740006 The GPU will not respond to more commands, most Likel y because of an invalid command passed by the calling application.
Traceback (most recent call last):
File "C: \Users\dungeon \onnxruntime-genai\examples\python\phi3-qa.py", Line 93, in <module>
main(args)
File "C: \Users\dungeon \onnxruntime-genai\examples\python\phi3-qa.py", line 56, in main generator. compute_logits
onxruntime_genai.onxruntime_genai.OrtException: Non-zero status code returned while running DmlFusedNode_0_0 node. Nam
e: 'DmlFusedNode_0_0' Status Message: D: \a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusi
onHelper. cpp(1060) \onnxruntime.dll! 00007FFD50C4AB39: (caller: 00007FFD50CD96AE) Exception(2) tid (3218) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

I am running on Intel Core Ultra 155H with latest GPU driver available.

jorisdg commented 1 month ago

I'm running into the same issue with Intel Ultra 165U. Now, running the same prompts that DO work on an older Intel with an Irix GPU I get the same error on the smaller prompts even. So it seems this may be something with the Intel GPU drivers related to memory perhaps? The "shared memory" showing in Windows task manager doesn't seem to actually represent what's going on behind the scenes.

I'm using this in C# using the Microsoft.ML.OnnxRuntimeGenAI.DirectML package, and not sure if there's a way to force to CPU so I can see if that works (I would expect so?).

yufenglee commented 1 month ago

@jorisdg, you can download the phi3-mini cpu model and try it out on CPU: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4

jorisdg commented 1 month ago

Thanks, I tried both the phi3 mini and mistral CPU versions and they work just fine.

jackylu0124 commented 1 month ago

I am encountering similar issues when I use long prompt on both the Phi-3-mini-4k-instruct-onnx model (see issue https://github.com/microsoft/onnxruntime-genai/issues/549 for more details) and the Phi-3-mini-128k-instruct-onnx model (see issue https://github.com/microsoft/onnxruntime-genai/issues/556 for more details) when inferencing on DirectML with the example phi3-qa.py script.

Package Version: onnxruntime-genai-directml 0.3.0rc2 GPU: RTX 3090

rockets-cn commented 1 month ago

why I can not run directml model directml\directml-int4-awq-block-128 on intel igpu?

Hardware: intel Core Ultra 9 185H with Arc GPU System: Windows 11 23H2

My laptop dont have NVIDIA GPU, but the onnxruntime_genai still require CUDA libraries

Traceback (most recent call last):
  File "C:\Users\rocke\github\onnxruntime\phi3-qa.py", line 1, in <module>
    import onnxruntime_genai as og
  File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\__init__.py", line 13, in <module>
    _dll_directory.add_dll_directory()
  File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\_dll_directory.py", line 21, in add_dll_directory
    raise ImportError("Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.")
ImportError: Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.
ofirzaf commented 1 month ago

why I can not run directml model directml\directml-int4-awq-block-128 on intel igpu?

Hardware: intel Core Ultra 9 185H with Arc GPU System: Windows 11 23H2

My laptop dont have NVIDIA GPU, but the onnxruntime_genai still require CUDA libraries

Traceback (most recent call last):
  File "C:\Users\rocke\github\onnxruntime\phi3-qa.py", line 1, in <module>
    import onnxruntime_genai as og
  File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\__init__.py", line 13, in <module>
    _dll_directory.add_dll_directory()
  File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai\_dll_directory.py", line 21, in add_dll_directory
    raise ImportError("Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.")
ImportError: Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.

Try to install previous version 0.2.0 or compile from source and it will solve this issue, however you might get the issue I'm facing.

yufenglee commented 1 month ago

We can repro those 2 issues(long prompt and DLL loading issue) and are working on a fix.

Thanks, Yufeng


From: Ofir Zafrir @.> Sent: Saturday, June 8, 2024 1:16:05 PM To: microsoft/onnxruntime-genai @.> Cc: Comment @.***> Subject: Re: [microsoft/onnxruntime-genai] Phi-3-Mini fails to execute on long prompts on Intel integrated GPU with DirectML (Issue #570)

why I can not run directml model directml\directml-int4-awq-block-128 on intel igpu?

Hardware: intel Core Ultra 9 185H with Arc GPU System: Windows 11 23H2

My laptop dont have NVIDIA GPU, but the onnxruntime_genai still require CUDA libraries

Traceback (most recent call last): File "C:\Users\rocke\github\onnxruntime\phi3-qa.py", line 1, in import onnxruntime_genai as og File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai__init__.py", line 13, in _dll_directory.add_dll_directory() File "C:\Users\rocke\AppData\Roaming\Python\Python311\site-packages\onnxruntime_genai_dll_directory.py", line 21, in add_dll_directory raise ImportError("Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.") ImportError: Could not find the CUDA libraries. Please set the CUDA_PATH environment variable.

Try to install previous version 0.2.0 or compile from source and it will solve this issue, however you might get the issue I'm facing.

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/onnxruntime-genai/issues/570#issuecomment-2156169286 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHITBNRJN3KOIPIIOFJU6B3ZGNRALBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVI3TANBUGA2TMMZVGOSG4YLNMWUWQYLTL5WGCYTFNSWHG5LCNJSWG5C7OR4XAZNMJFZXG5LFINXW23LFNZ2KM5DPOBUWG44TQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJXGE4DGMJUG43DDAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMZTGU3DCMZVGEYYFJDUPFYGLJLMMFRGK3FFOZQWY5LFVI3TANBUGA2TMMZVGOTXI4TJM5TWK4VGMNZGKYLUMU. You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

baijumeswani commented 3 weeks ago

This long prompt issue should be resolved with the 0.3.0 release. I'll close this issue now. But please let us know if you still see the issue and we will re-open the investigation.

For the DLL loading issue, please see https://github.com/microsoft/onnxruntime-genai/issues/555.

jorisdg commented 2 weeks ago

I tried the 3.0 release, the error message has changed:

The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

baijumeswani commented 2 weeks ago

Could you please share the complete error and steps to reproduce?

jorisdg commented 2 weeks ago

Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommittedResourceAllocator.cpp(22)\onnxruntime.dll!00007FFCED3321E1: (caller: 00007FFCED31423C) Exception(1) tid(6008) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action. '

NuGets: Microsoft.ML.OnnxRuntime.DirectML : 1.18.0 Microsoft.ML.OnnxRuntimeGenAI.DirectML : 0.3.0

I'm using Phi-3-mini-4k-instruct-onnx\directml\directml-int4-awq-block-128 on an Intel Ultra 165U GPU. When I use this small prompt, it works fine: "give a brief overview of the heliocentric view of the solar system" When I use this slightly larger prompt, I get the error: "I'm trying to create a longer prompt to do some testing. How long is this prompt exactly, can you count tokens?"

ofirzaf commented 2 weeks ago

Same issue here:

OrtException: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommittedResourceAllocator.cpp(22)\onnxruntime.dll!00007FFB00EA21E1: (caller: 00007FFB00E8423C) Exception(1) tid(2594) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.