mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.28k stars 1.59k forks source link

[Feature Request] Support for Qualcomm Snapdragon XElite PCs (arm64 Windows and WSL2-Linux) as target #2617

Open Sing-Li opened 4 months ago

Sing-Li commented 4 months ago

🚀 Feature

Add support for arm64 Windows/Linux for the Qualcomm Snapdragon XElite PCs

Motivation

These Copilot+ PCs are now available widely all over the world. It has the potential of becoming the basis for AI workstations performing at the level of Apple M1/M2/M3 for running MLC-LLM WITHOUT having to pay for Apple's crazy "memory premium". They all use a unified memory model similar to Apple's with Qualcomm designing the SoC for max memory access bandwidth; configuration ranges from 8GB to 64GB with reasonable incremental price increase.

Alternatives

No other hardware alternative exists as of 2024 (and probably not 2025 either).

Additional context

Vulkan is already supported natively on arm64 for the Adreno GPU cores; but availability of supported Vulkan SDK on arm64 is pending. MLCAI also has no nightly that is compatible with arm64 (either Windows or Linux) at this time.

https://www.reddit.com/r/WindowsARM/comments/1bp927h/qualcomm_confirms_native_vulkan_support_in_their/

https://www.xda-developers.com/qualcomm-snapdragon-x-elite-plus-gpu-architecture/

https://vulkan.lunarg.com/issue/view/654594145df11238996e13fe

manuelpaulo commented 1 month ago

Indeed. An NPU and a GPU and, as far as I know, the only way to run llms is using the CPU. (llmstudio)

MLC LLM would make a blast if releasing this.

Anyway you can already test the Adreno GPU of your CoPilot PC in https://chat.webllm.ai

Sing-Li commented 1 month ago

Yes, webllm indeed shows the potential.

Qualcomm seems to be very confused unfortunately. With the advent of "small yet tuned for today's applications" LLMs such as Llama 3.2 3b and Gemma 2 2b; good quality performant models can now run under mainstream GPU's VRAM capacity (of 8GB). I recently tested a sub $800 "gaming notebook" with the mass marketed 4060, and it worked plenty fast with these models on lmstudio,