vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.76k stars 3.92k forks source link

Inquiry Regarding vLLM Support for Mac Metal API #2081

Open yihong1120 opened 9 months ago

yihong1120 commented 9 months ago

Dear vLLM Maintainers,

I hope this message finds you well. I am reaching out to inquire about the potential for integrating Mac Metal API support within the vLLM framework. As an avid user and advocate for vLLM's capabilities, I have been thoroughly impressed with its performance and flexibility across various platforms and hardware configurations.

Given the increasing prevalence of Mac devices in the machine learning community and the performance benefits offered by Apple's Metal API for GPU-accelerated computing, I am curious to know if there are any plans to extend vLLM's compatibility to include Metal support. This would undoubtedly be a significant boon for researchers and developers working on Mac environments who wish to leverage vLLM's impressive suite of features.

Could you please shed some light on the following aspects:

  1. Are there any ongoing efforts or discussions around incorporating Metal API support into vLLM?
  2. If such plans are in the pipeline, what is the anticipated timeline for the availability of this feature?
  3. How might the community contribute to expediting this process, and are there specific areas where contributions are most needed?

I understand that integrating a new backend such as Metal may present a variety of challenges, but I believe the potential benefits to the user community could be substantial. I am keen to offer my assistance, whether it be through testing, development, or documentation, to help bring this capability to fruition.

Thank you for your time and consideration. I eagerly await your response and am excited about the prospect of further enhancing vLLM's accessibility and performance on Mac platforms.

Best regards, yihong1120

jagtesh commented 7 months ago

Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing device="cuda" to "mps" on Macs? Are there any other dependencies on CUDA?

jagtesh commented 7 months ago

Torch has officially supports Metal for a while now. Would adding support in vLLM be as simple as changing device="cuda" to "mps" on Macs? Are there any other dependencies on CUDA?

Anyone? I'd be happy to rewrite the implementation without the hardcoded device name - just don't want to spend hours down a dead-end.

C0deMunk33 commented 7 months ago

I'd like to see this work as well, lots of Metal out there

bluenevus commented 7 months ago

same here please

chen-bowen commented 6 months ago

+1

nostaljic commented 5 months ago

Wish it could be implemented🥺

jagtesh commented 5 months ago

My offer still stands if someone on the project can answer the above questions.

hmellor commented 5 months ago

@pathorn says they have an implementation that runs on M3 chips in https://github.com/vllm-project/vllm/issues/176#issuecomment-2023827553

Do you think it could be adapted to the new CPU backend that was added in #3634?

jagtesh commented 4 months ago

@pathorn says they have an implementation that runs on M3 chips in #176 (comment)

Do you think it could be adapted to the new CPU backend that was added in #3634?

FYI for anyone who wants to see that PR: https://github.com/vllm-project/vllm/pull/2244#issuecomment-1868419884. @pathorn did some tremendous work on the PR. However, llama.cpp still performs faster - by a mile. This may not be a fruitful endeavour after all.