Open elephantpanda opened 3 weeks ago
I think it is a something to do with input length.
Since I can get the same bug by making a really long prompt like this:
for (int i = 0; i < 500; i++) prompt += " cat";
But it is a a bit weird to get a memory bug when it is only using half my VRAM and RAM. As has been noted by other people it seems to have a problem with long contexts and has memory bugs.
[Actually the problem with certain prompts causing an error seems to be a different bug]
As an aside, the image seems to be compressed to about 2500 tokens (50x50?). Is there a way to lower this for smaller images?
Same bug is in version 0.4.0
See also here, for running just the vision part of the model in onnxruntime.
I am running the phi3 vision directml tutorial code on NVidia Quadro P5000 GPU, 16GB VRAM, +12GB RAM (Windows 10) , but it fails when I try to put an image path in there:
It works without putting an image there.
I have tried both jpg and png images. Here is my image:
Any ideas what could be wrong?
I have 16GB GPU RAM and 12GB RAM and it's only using about half of it so I don't think that's the problem.
Come to think of it the phi-3 vision tutorial doesn't say it supports DML yet... even though there is a DML model. It says "Support for DirectML is coming soon!" But not sure how soon this means.
I tried it in C# and get the same error ☹
I feel like my specifications meet above the recommended. (I also tried it with the CPU only version and it works but is incredibly slow. e.g. 5 minutes+ to get a response even with a very small image. The image size doesn't seem to make a difference which is odd(!) I'm not sure how the vision thing works. Is it iterating over every small patch or something?).