[Feature]: Dockerfile.cpu for aarch64

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

30.2k stars 4.57k forks source link

[Feature]: Dockerfile.cpu for aarch64 #8259

Open khayamgondal opened 2 months ago

khayamgondal commented 2 months ago

🚀 The feature, motivation and pitch

Please provide a Dockerfile.cpu for aarch64 systems. I have a GH200 and I want to run the CPU only inference

Alternatives

No response

Additional context

No response

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

mgoin commented 2 months ago

First we need to implement a CPU backend for ARM. Currently we only have implementations for x86 and PowerPC backends

shahizat commented 2 months ago

Hi @mgoin , I hope you guys will release a version compatible with arm64 machines, such as the Nvidia AGX Orin Developer Kit. Looking forward!!

khayamgondal commented 2 months ago

Yeah especially now since there is more push towards ARM machines by nvidia

mgoin commented 2 months ago

To be clear you can use vLLM CUDA on ARM machines i.e. it is easy to get it working on Grace-Hopper. We just don't have a vLLM CPU backend for ARM machines.

khayamgondal commented 2 months ago

Yes, I am aware of that and actually use that. Some part of my research focuses on CPU only - for that I am interested in vllm arm cpu implementation

On Wed, Sep 11, 2024, 2:22 PM Michael Goin @.***> wrote:

To be clear you can use vLLM CUDA on ARM machines i.e. it is easy to get it working on Grace-Hopper. We just don't have a vLLM CPU backend for ARM machines.

— Reply to this email directly, view it on GitHub https://github.com/vllm-project/vllm/issues/8259#issuecomment-2344527564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATNG3YZPW5S5SJIUVCXPUTZWCJ6JAVCNFSM6AAAAABNZWCFIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBUGUZDONJWGQ . You are receiving this because you authored the thread.Message ID: @.***>