quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
438 stars 60 forks source link

What is the limitation of Hexagon V75 that the Llama v2 7B Quantized model should be split into 8 Bin files ? #100

Closed taeyeonlee closed 6 days ago

taeyeonlee commented 1 week ago

Hello, What is the limitation of Hexagon V75 that the Llama v2 7B Quantized model should be split into 8 Bin files ?

gustavla commented 1 week ago

Hi @taeyeonlee, thanks for your question. It has to do with the address space limitation of the Hexagon. We hope to abstract this away from the user in the future, but for now we have to manage these splits manually.