vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.2k stars 4.57k forks source link

[Feature]: Dockerfile.cpu for aarch64 #8259

Open khayamgondal opened 2 months ago

khayamgondal commented 2 months ago

🚀 The feature, motivation and pitch

Please provide a Dockerfile.cpu for aarch64 systems. I have a GH200 and I want to run the CPU only inference

Alternatives

No response

Additional context

No response

Before submitting a new issue...

mgoin commented 2 months ago

First we need to implement a CPU backend for ARM. Currently we only have implementations for x86 and PowerPC backends

shahizat commented 2 months ago

Hi @mgoin , I hope you guys will release a version compatible with arm64 machines, such as the Nvidia AGX Orin Developer Kit. Looking forward!!

khayamgondal commented 2 months ago

Yeah especially now since there is more push towards ARM machines by nvidia

mgoin commented 2 months ago

To be clear you can use vLLM CUDA on ARM machines i.e. it is easy to get it working on Grace-Hopper. We just don't have a vLLM CPU backend for ARM machines.

khayamgondal commented 2 months ago

Yes, I am aware of that and actually use that. Some part of my research focuses on CPU only - for that I am interested in vllm arm cpu implementation

On Wed, Sep 11, 2024, 2:22 PM Michael Goin @.***> wrote:

To be clear you can use vLLM CUDA on ARM machines i.e. it is easy to get it working on Grace-Hopper. We just don't have a vLLM CPU backend for ARM machines.

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/8259#issuecomment-2344527564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATNG3YZPW5S5SJIUVCXPUTZWCJ6JAVCNFSM6AAAAABNZWCFIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBUGUZDONJWGQ . You are receiving this because you authored the thread.Message ID: @.***>