Open yichunx1 opened 3 months ago
@PatriceVignola do you have any insights on this?
I just tried different models.
When I use phi3 mini 128k, the answer is a few line of "////////"s.
I also tried Mistral 7b, it shows error, saying that the model cannot be found (but I can see the optimized model is in the folder with others).
Fortunately, when I go for gemma 7b, the output is normal.
Does anyone knows why?
@yichunx1 Which GPU are you using? And which onnxruntime-directml version are you using?
Describe the bug I followed all the steps from LLM Optimization with DirectML. I was able to find the ONNX model and was able to start gradio UI. But no matter what I entered in the chat box, the reply is always a series of "O"s, as shown in the following screenshot.
To Reproduce I followed this for setup: https://github.com/microsoft/Olive/blob/main/examples/README.md#important I also pip installed pillow because it's not in the requirement.txt Then I followed this for ONNX conversion and run chat app. https://github.com/microsoft/Olive/tree/main/examples/directml/llm I also tried the gradio 4.29.0 but it seems not compatible.
Expected behavior The reply should be text instead of "O"s.
Olive config Add Olive configurations here.
Olive logs Add logs here.
Other information
Additional context Add any other context about the problem here.