Open Sing-Li opened 4 months ago
Indeed. An NPU and a GPU and, as far as I know, the only way to run llms is using the CPU. (llmstudio)
MLC LLM would make a blast if releasing this.
Anyway you can already test the Adreno GPU of your CoPilot PC in https://chat.webllm.ai
Yes, webllm indeed shows the potential.
Qualcomm seems to be very confused unfortunately. With the advent of "small yet tuned for today's applications" LLMs such as Llama 3.2 3b and Gemma 2 2b; good quality performant models can now run under mainstream GPU's VRAM capacity (of 8GB). I recently tested a sub $800 "gaming notebook" with the mass marketed 4060, and it worked plenty fast with these models on lmstudio,
🚀 Feature
Add support for arm64 Windows/Linux for the Qualcomm Snapdragon XElite PCs
Motivation
These Copilot+ PCs are now available widely all over the world. It has the potential of becoming the basis for AI workstations performing at the level of Apple M1/M2/M3 for running MLC-LLM WITHOUT having to pay for Apple's crazy "memory premium". They all use a unified memory model similar to Apple's with Qualcomm designing the SoC for max memory access bandwidth; configuration ranges from 8GB to 64GB with reasonable incremental price increase.
Alternatives
No other hardware alternative exists as of 2024 (and probably not 2025 either).
Additional context
Vulkan is already supported natively on arm64 for the Adreno GPU cores; but availability of supported Vulkan SDK on arm64 is pending. MLCAI also has no nightly that is compatible with arm64 (either Windows or Linux) at this time.
https://www.reddit.com/r/WindowsARM/comments/1bp927h/qualcomm_confirms_native_vulkan_support_in_their/
https://www.xda-developers.com/qualcomm-snapdragon-x-elite-plus-gpu-architecture/
https://vulkan.lunarg.com/issue/view/654594145df11238996e13fe