Closed tomas-pet closed 3 months ago
Hi @tomas-pet, Which version of onnxruntime-genai-directml do you have?
I am using latest
Any update on this?
@tomas-pet We need some more information to try and repro this.
Can you share the output of pip list?
Which model are you using? Did you download it from HuggingFace?
Can you share the genai_config.json file from the model folder please.
Here is output of pip list: Package Version
accelerate 0.29.2 aiohttp 3.9.3 aiosignal 1.3.1 async-timeout 4.0.3 attrs 23.2.0 auto-gptq 0.7.1 certifi 2024.2.2 charset-normalizer 3.3.2 cmake 3.29.1 colorama 0.4.6 coloredlogs 15.0.1 datasets 2.18.0 diffusers 0.27.2 dill 0.3.8 filelock 3.13.3 flatbuffers 24.3.25 frozenlist 1.4.1 fsspec 2024.2.0 gekko 1.1.0 huggingface-hub 0.22.2 humanfriendly 10.0 idna 3.6 importlib-metadata 7.1.0 inquirerpy 0.3.4 Jinja2 3.1.3 MarkupSafe 2.1.5 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.1 numpy 1.24.4 onnx 1.16.0 onnxruntime-directml 1.17.3 onnxruntime-genai 0.1.0 onnxruntime-genai-directml 0.2.0rc4 optimum 1.18.0 ort-nightly-qnn 1.18.0.dev20240428001 packaging 24.0 pandas 2.0.3 peft 0.10.0 pfzy 0.3.4 pillow 10.3.0 pip 21.1.1 prompt-toolkit 3.0.43 protobuf 5.26.1 psutil 5.9.8 pyarrow 15.0.2 pyarrow-hotfix 0.6 pyreadline3 3.4.1 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 regex 2023.12.25 requests 2.31.0 rouge 1.0.1 safetensors 0.4.2 sentencepiece 0.2.0 setuptools 56.0.0 six 1.16.0 sympy 1.12 tokenizers 0.15.2 torch 2.2.2 tqdm 4.66.2 transformers 4.40.0.dev0 typing-extensions 4.10.0 tzdata 2024.1 urllib3 2.2.1 wcwidth 0.2.13 xxhash 3.4.1 yarl 1.9.4 zipp 3.18.1
I am following instructions from here: https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md . I am using this model: Phi-3-mini-128k-instruct-onnx. I am using exact instructions from the MD to download model: git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx
Attached is genai_config.json. genai_config.json
Hi @tomas-pet,
Which hardware are you using? I see that you have the ort-nightly-qnn
package installed, but onnxruntime-genai-directml doesn't officially support ARM builds yet. It might work due to the x64 emulation layer, but it will probably use a lot more memory than expected and won't be nearly as performant, and it might break in unexpected ways.
Nevertheless, we recently made changes to adapter selection that could potentially fix your issue. You can test it out by building from source.
Still getting same error. the problem is in your phi3-qa.py. Look at this line: params.try_use_cuda_graph_with_max_batch_size(1)
This is by default thinking I am using CUDA
params.try_use_cuda_graph_with_max_batch_size(1)
is not the issue here (it is misleading, but it also enables DML graph. Probably something that we should rename eventually).
Can you tell me which hardware you're trying to run on? We don't support ARM builds yet, and although it might work with the x64 emulation layer, it's probably not going to be the best experience even if it does work. We'll be adding ARM builds in the future to have a good native experience on those devices.
Either way, if you tell me which GPU/hardware you're running on, I can try to see if I can reproduce your issue.
Hi @tomas-pet, can you please share the hardware you are running on?
Closing this issue since it is not reproducible. Please comment/reopen if you're still seeing the issue.
Here was my input command: python model-qa.py -m Phi-3-mini-128k-instruct-onnx/directml/directml-int4-awq-block-128 -l 2048
Here is the error I am getting: Input: hi
Output: Traceback (most recent call last): File "model-qa.py", line 82, in
main(args)
File "model-qa.py", line 47, in main
generator.compute_logits()
onnxruntime_genai.onnxruntime_genai.OrtException: Failed to parse the cuda graph annotation id: -1