microsoft / Olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
https://microsoft.github.io/Olive/
MIT License
1.51k stars 159 forks source link

LLM Optimization with DirectML reply only displays "O"s #1282

Open yichunx1 opened 1 month ago

yichunx1 commented 1 month ago

Describe the bug I followed all the steps from LLM Optimization with DirectML. I was able to find the ONNX model and was able to start gradio UI. But no matter what I entered in the chat box, the reply is always a series of "O"s, as shown in the following screenshot. bug_screenshot

To Reproduce I followed this for setup: https://github.com/microsoft/Olive/blob/main/examples/README.md#important I also pip installed pillow because it's not in the requirement.txt Then I followed this for ONNX conversion and run chat app. https://github.com/microsoft/Olive/tree/main/examples/directml/llm I also tried the gradio 4.29.0 but it seems not compatible.

Expected behavior The reply should be text instead of "O"s.

Olive config Add Olive configurations here.

Olive logs Add logs here.

Other information

Additional context Add any other context about the problem here.

jambayk commented 1 month ago

@PatriceVignola do you have any insights on this?

yichunx1 commented 1 month ago

I just tried different models. When I use phi3 mini 128k, the answer is a few line of "////////"s. I also tried Mistral 7b, it shows error, saying that the model cannot be found (but I can see the optimized model is in the folder with others).
Fortunately, when I go for gemma 7b, the output is normal. Does anyone knows why?

PatriceVignola commented 1 month ago

@yichunx1 Which GPU are you using? And which onnxruntime-directml version are you using?