vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
22.24k stars 3.14k forks source link

[Feature]: Need CPU inferencing support for non-x86 architectures #5741

Open ChipKerchner opened 1 week ago

ChipKerchner commented 1 week ago

🚀 The feature, motivation and pitch

We have a need for vLLM to support CPU inferencing for the PowerPC architecture. This project should start thinking about non-x86 platforms.

Initial PowerPC CPU support

Alternatives

No response

Additional context

No response

mgoin commented 1 week ago

I certainly agree we need to expand CPU support to other architectures and I especially think ARM would be quite impactful given the widespread availability and relatively large amount of memory bandwidth (compared to x86 processors) available on Apple's M-series, AWS Graviton, NVIDIA Grace, and Google Axion ARM processors.

I'm only familiar with PowerPC from my past work in HPC so maybe my understanding is outdated but I believe these are not easy to get a hold of. @ChipKerchner Are there public instances available of the hardware so we can have CI regularly run? Could you share more about the use case or need from your group?

manojnkumar commented 1 week ago

@mgoin : We do have IBM Power instances in public clouds and universities. We should be able to dedicate an instance for regullar CI runs.