Closed jackylu0124 closed 4 months ago
@PatriceVignola could you please help with this?
A quick follow-up to this issue. I would really appreciate any help or insights on this issue!
The issue here is related to a memory leak that was addressed. It should be resolved in ort-genai release 0.3.0.
I'll close the issue. If you still encounter the problem, please re-open the issue.
I confirm that this issue is now resolved with the 0.3.0 release.
I am running the
Phi-3-mini-128k-instruct-onnx
model on DirectML with the examplephi3-qa.py
script (https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py), and the only change that I have made to the script is that I changedsearch_options['max_length'] = 2048
tosearch_options['max_length'] = 4096
to allow longer input. I encountered the issue when the input is long (longer than 2048 tokens), and more specifically, the following is my input prompt, and the error logs:Input Prompt (2715 tokens):
Error Logs (parts of the exact paths are replaced with xxx for privacy reasons):
The error I encountered here seems to be very similar to the one that I reported in https://github.com/microsoft/onnxruntime-genai/issues/549, even though that issue uses the model with 4k context window while this issue uses the model with 128k context window. I would really appreciate any insights on and solutions for both of these issues!
Package Version:
onnxruntime-genai-directml 0.3.0rc2
GPU: RTX 3090