vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.03k stars 4.72k forks source link

Hardware Backend Deprecation Policy #8932

Open youkaichao opened 2 months ago

youkaichao commented 2 months ago

Anything you want to discuss about vllm.

vLLM heavily depends on PyTorch, and also actively works with PyTorch team to leverage their new features. When a new PyTorch version comes out, vLLM usually upgrades to the latest PyTorch directly.

Meanwhile, vLLM supports diverse hardware backends from different vendors. They often require their own PyTorch versions.

In order to speed up the development of vLLM, hereby we require all vendors to keep up with PyTorch.

Starting from PyTorch 2.5 (Release Day (10/17/24)), vLLM will drop hardware support if it cannot support PyTorch 2.5.

potentially affected vendors and the current PyTorch version they require:

Note that latest pytorch support is a necessary condition for vLLM's hardware vendors. It is not sufficient. The vLLM team considers adding new hardware support depending on the community interest, the priority of the main branch, and the bandwidth of the team.

Before submitting a new issue...

youkaichao commented 2 months ago

see https://dev-discuss.pytorch.org/t/pytorch-2-5-rc1-is-produced-for-pytorch-audio-vision/2460 for pytorch release schedule

youkaichao commented 2 months ago

vLLM will try to upgrade to pytorch 2.5 at first, and we will leave 1~2 weeks for hardware vendors to catch up.

ilya-lavrenov commented 2 months ago

OpenVINO has 2.1.2 as lower bound version https://github.com/vllm-project/vllm/blob/6c9ba48fdebe2f44c82eabfe136dc8dc6ad6f4ed/requirements-openvino.txt#L5 which means any newer can also work.

We just rely on PyTorch version supported by HF itself (e.g. transformers, tokenizers, optimum, etc)

youkaichao commented 2 months ago

@ilya-lavrenov it is good to know, can you change openvino to use the same pytorch version (currently 2.4) as the default case?

ghchris2021 commented 2 months ago

Re: "intel xpu (2.3.1)"

I don't know almost any context here wrt. many vllm, pytorch specifics. But I believe my understanding is correct that in fact starting with pytorch v2.5 and becoming more complete in some gap areas in v2.6 pytorch will be supporting intel xpu devices including both data center models (I think some such was supported in v2.4) and client ARC / flex series etc. GPUs natively in pytorch without (AFAICT) depending on the IPEX intel pytorch extensions based XPU support.

So if there is anything that is worse supported in pytorch v2.5 than v2.3.1 for intel xpu I don't know or expect it to be so except I do not know what utility could be had if any for using the IPEX based pytorch extensions wrt. xpu in v2.5+ since they might be not so much the relevant / necessary provider of xpu optimized support starting in v2.5.

jeffhataws commented 1 month ago

@youkaichao , why 2.5 and not 2.4?

youkaichao commented 1 month ago

@youkaichao , why 2.5 and not 2.4?

we need to leave some time for hardware vendors to catch up.

zhouyuan commented 1 month ago

@youkaichao @ghchris2021 IPEX XPU will do a release around 10/17 to support this per offline discussion

CC @tye1

thanks, -yuan

youkaichao commented 1 month ago

after discussing with hardware vendors, the final process would be:

  1. when a new pytorch version comes out, vLLM will try to upgrade to the newest version first.
  2. then we will put up one issue, and all hardware vendors need to respond there, to give their timeline of pytorch version support
  3. vLLM team reserves the right to drop the hardware backend if one hardware backend falls behind or cannot support new pytorch version within the timeline they promise.
  4. if a hardware backend is dropped, the vendor can still maintain it in their own fork. vLLM will delete that hardware backend, but will leave one link in the documentation, referring users to that fork.
ilya-lavrenov commented 1 month ago

@ilya-lavrenov it is good to know, can you change openvino to use the same pytorch version (currently 2.4) as the default case?

Sure, please, have a look https://github.com/vllm-project/vllm/pull/9121

youkaichao commented 3 weeks ago

when a new pytorch version comes out, vLLM will try to upgrade to the newest version first.

defining this "when" , it should be the first vllm release that comes with the newest pytorch version.

molereddy commented 2 days ago

@youkaichao I just upgraded from v0.6.2 to v0.6.4.post1. This also upgraded my torch version from 2.4 to 2.5. But this results in the below new error on attempting >>>from vllm import LLM, SamplingParams: ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.4.*, but PyTorch 2.5.1+cu124 is found. Please switch to the matching version and run again.

youkaichao commented 2 days ago

PyTorch 2.5.1+cu124

how do you install the intel version while having a cuda version pytorch installed?

zhouyuan commented 2 days ago

@molereddy Hi, if the target to use CUDA backend, you may simply remove the intel extension by pip uninstall intel_extension_for_pytorch

thanks, -yuan

molereddy commented 2 days ago

@zhouyuan that worked. I'm unsure how the Intel extension ended up being installed in my vLLM environment, but removing it isn't breaking anything.