microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
449 stars 106 forks source link

Error while running Phi-3 with DML #336

Closed tomas-pet closed 3 months ago

tomas-pet commented 5 months ago

Here was my input command: python model-qa.py -m Phi-3-mini-128k-instruct-onnx/directml/directml-int4-awq-block-128 -l 2048

Here is the error I am getting: Input: hi

Output: Traceback (most recent call last): File "model-qa.py", line 82, in main(args) File "model-qa.py", line 47, in main generator.compute_logits() onnxruntime_genai.onnxruntime_genai.OrtException: Failed to parse the cuda graph annotation id: -1

PatriceVignola commented 5 months ago

Hi @tomas-pet, Which version of onnxruntime-genai-directml do you have?

tomas-pet commented 5 months ago

I am using latest

tomas-pet commented 5 months ago

Any update on this?

natke commented 5 months ago

@tomas-pet We need some more information to try and repro this.

Can you share the output of pip list?

Which model are you using? Did you download it from HuggingFace?

Can you share the genai_config.json file from the model folder please.

tomas-pet commented 5 months ago

Here is output of pip list: Package Version


accelerate 0.29.2 aiohttp 3.9.3 aiosignal 1.3.1 async-timeout 4.0.3 attrs 23.2.0 auto-gptq 0.7.1 certifi 2024.2.2 charset-normalizer 3.3.2 cmake 3.29.1 colorama 0.4.6 coloredlogs 15.0.1 datasets 2.18.0 diffusers 0.27.2 dill 0.3.8 filelock 3.13.3 flatbuffers 24.3.25 frozenlist 1.4.1 fsspec 2024.2.0 gekko 1.1.0 huggingface-hub 0.22.2 humanfriendly 10.0 idna 3.6 importlib-metadata 7.1.0 inquirerpy 0.3.4 Jinja2 3.1.3 MarkupSafe 2.1.5 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.1 numpy 1.24.4 onnx 1.16.0 onnxruntime-directml 1.17.3 onnxruntime-genai 0.1.0 onnxruntime-genai-directml 0.2.0rc4 optimum 1.18.0 ort-nightly-qnn 1.18.0.dev20240428001 packaging 24.0 pandas 2.0.3 peft 0.10.0 pfzy 0.3.4 pillow 10.3.0 pip 21.1.1 prompt-toolkit 3.0.43 protobuf 5.26.1 psutil 5.9.8 pyarrow 15.0.2 pyarrow-hotfix 0.6 pyreadline3 3.4.1 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 regex 2023.12.25 requests 2.31.0 rouge 1.0.1 safetensors 0.4.2 sentencepiece 0.2.0 setuptools 56.0.0 six 1.16.0 sympy 1.12 tokenizers 0.15.2 torch 2.2.2 tqdm 4.66.2 transformers 4.40.0.dev0 typing-extensions 4.10.0 tzdata 2024.1 urllib3 2.2.1 wcwidth 0.2.13 xxhash 3.4.1 yarl 1.9.4 zipp 3.18.1

I am following instructions from here: https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md . I am using this model: Phi-3-mini-128k-instruct-onnx. I am using exact instructions from the MD to download model: git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

Attached is genai_config.json. genai_config.json

PatriceVignola commented 5 months ago

Hi @tomas-pet,

Which hardware are you using? I see that you have the ort-nightly-qnn package installed, but onnxruntime-genai-directml doesn't officially support ARM builds yet. It might work due to the x64 emulation layer, but it will probably use a lot more memory than expected and won't be nearly as performant, and it might break in unexpected ways.

Nevertheless, we recently made changes to adapter selection that could potentially fix your issue. You can test it out by building from source.

tomas-pet commented 4 months ago

Still getting same error. the problem is in your phi3-qa.py. Look at this line: params.try_use_cuda_graph_with_max_batch_size(1)

This is by default thinking I am using CUDA

PatriceVignola commented 4 months ago

params.try_use_cuda_graph_with_max_batch_size(1) is not the issue here (it is misleading, but it also enables DML graph. Probably something that we should rename eventually).

Can you tell me which hardware you're trying to run on? We don't support ARM builds yet, and although it might work with the x64 emulation layer, it's probably not going to be the best experience even if it does work. We'll be adding ARM builds in the future to have a good native experience on those devices.

Either way, if you tell me which GPU/hardware you're running on, I can try to see if I can reproduce your issue.

natke commented 4 months ago

Hi @tomas-pet, can you please share the hardware you are running on?

baijumeswani commented 3 months ago

Closing this issue since it is not reproducible. Please comment/reopen if you're still seeing the issue.