microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
418 stars 95 forks source link

Error running phi-3 vision directml P5000 gpu #822

Open elephantpanda opened 3 weeks ago

elephantpanda commented 3 weeks ago

I am running the phi3 vision directml tutorial code on NVidia Quadro P5000 GPU, 16GB VRAM, +12GB RAM (Windows 10) , but it fails when I try to put an image path in there:

image

It works without putting an image there.

image

I have tried both jpg and png images. Here is my image: decoded

Any ideas what could be wrong?

I have 16GB GPU RAM and 12GB RAM and it's only using about half of it so I don't think that's the problem.

Come to think of it the phi-3 vision tutorial doesn't say it supports DML yet... even though there is a DML model. It says "Support for DirectML is coming soon!" But not sure how soon this means.

I tried it in C# and get the same error ☹

OnnxRuntimeGenAIException: Non-zero status code returned while running MemcpyToHost node. Name:'Memcpy_token_5' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime.dll!00007FF8171EFC45: (caller: 00007FF81780254D) Exception(9) tid(5324) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess (System.IntPtr nativeResult) (at D:/a/_work/1/onnxruntime-genai/src/csharp/Result.cs:26)
Microsoft.ML.OnnxRuntimeGenAI.Generator.ComputeLogits () (at D:/a/_work/1/onnxruntime-genai/src/csharp/Generator.cs:25)

I feel like my specifications meet above the recommended. (I also tried it with the CPU only version and it works but is incredibly slow. e.g. 5 minutes+ to get a response even with a very small image. The image size doesn't seem to make a difference which is odd(!) I'm not sure how the vision thing works. Is it iterating over every small patch or something?).

elephantpanda commented 3 weeks ago

I think it is a something to do with input length. Since I can get the same bug by making a really long prompt like this: for (int i = 0; i < 500; i++) prompt += " cat"; But it is a a bit weird to get a memory bug when it is only using half my VRAM and RAM. As has been noted by other people it seems to have a problem with long contexts and has memory bugs.

[Actually the problem with certain prompts causing an error seems to be a different bug]

As an aside, the image seems to be compressed to about 2500 tokens (50x50?). Is there a way to lower this for smaller images?

elephantpanda commented 2 weeks ago

Same bug is in version 0.4.0

elephantpanda commented 2 weeks ago

See also here, for running just the vision part of the model in onnxruntime.