microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
526 stars 130 forks source link

onnxruntime-genai - QNN Memory Usage #976

Open dhl899 opened 1 month ago

dhl899 commented 1 month ago

Referring to https://github.com/microsoft/onnxruntime-genai/issues/961, will it address the memory aspect of running big models on the npu? thanks

skyline75489 commented 1 month ago

Hi we're still working on QNN support. The NPU has it's own limitation when it comes to memory usage, depends on the specific device specification. Likely it will not fully address the memory issue running big models.