vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.77k stars 3.92k forks source link

[Feature]: Request for Ascend NPU support #6368

Open xuedinge233 opened 2 months ago

xuedinge233 commented 2 months ago

🚀 The feature, motivation and pitch

Background

Currently, the project supports various hardware accelerators such as GPUs, but there is no support for NPUs. Adding NPU support could significationly benefit users who have access to these devices, enabling faster and more efficient computations.

Reference Materials

Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.

Specific Request

we would like to request the addition of support for NPUs within the project. In order to achieve this goal, we will contributing code and providing feedback. This request may additional resources and effort on your part, but we hope to get your help if possible.

Alternatives

No response

Additional context

No response

mgoin commented 2 months ago

Duplicate of https://github.com/vllm-project/vllm/issues/1606 and https://github.com/vllm-project/vllm/issues/6066

xuedinge233 commented 2 months ago

Duplicate of #1606 and #6066

Thinks for your reply, we are ready to develop Ascend ADAPTS for the vllm-project, after completion, we will provide you with a public Ascend prototype

mgoin commented 2 months ago

That's great to hear @xuedinge233! Having hardware instances available for CI would be great for maintaining support

xuedinge233 commented 2 months ago

Yeah, we‘re working on it, and we will upload it as soon as there is progress in the development

BrightXiaoHan commented 1 month ago

Yeah, we‘re working on it, and we will upload it as soon as there is progress in the development

Any update about this?

xuedinge233 commented 1 month ago

Yeah, we‘re working on it, and we will upload it as soon as there is progress in the development

Any update about this?

We are making steady progress and expect to deliver the first version of the code in a couple of weeks.

zer0py2c commented 1 month ago

Is it planned to support NPU+CANN environments where the CPU is the aarch64 architecture?

dogeeelin commented 1 month ago

Looking forward to NPU support on vLLM!

MengqingCao commented 3 weeks ago

Is it planned to support NPU+CANN environments where the CPU is the aarch64 architecture?

Yes, NPU+MindIE+CANN is in plan, and both aarch64 and x86_64 will be supported.

ccly1996 commented 2 weeks ago

Yeah, we‘re working on it, and we will upload it as soon as there is progress in the development

Any update about this?

We are making steady progress and expect to deliver the first version of the code in a couple of weeks.

Excuse me, is there any progress?

zer0py2c commented 2 weeks ago

Is it planned to support NPU+CANN environments where the CPU is the aarch64 architecture?

Yes, NPU+MindIE+CANN is in plan, and both aarch64 and x86_64 will be supported.

I mean NPU+VLLM+CANN, and based on aarch64 CPU architecture. NPU device is Atlas 300I Duo or others.

shilei4260 commented 2 weeks ago

@xuedinge233 How long will it take to release the vllm version of npu

xuedinge233 commented 2 weeks ago

@xuedinge233 How long will it take to release the vllm version of npu

The latest updates on the progress can be found here https://github.com/vllm-project/vllm/pull/8054