Open wangshuai09 opened 2 weeks ago
we can do it step by step.
is_cpu -> current_platform.is_cpu is_xpu -> current_platform.is_xpu is_openvino -> current_platform.is_openvino is_neuron -> current_platform.is_neuron
this can be the first step, and should be easy to do.
the rest might need some case-by-case discussion.
Motivation.
vLLM
has already been adapted to many hardware devices, such asGPU
,TPU
, andXPU
. However, adapting these backends requires implementing separateWorker/Executor/Model Runner
frameworks for each, which leads to code redundancy and maintenance difficulties. In fact, these hardware framework codes can be abstracted at the device layer, forming a unified framework. This way, only one set of code would need to be maintained, and different backends would only need to implement the device layer interfaces and any device-specific logic if necessary. I also found some new features are only updated on GPU-related codes. In fact, these codes are also applicable to other hardware, but it is difficult for other hardware to perceive and follow these updates.Proposed Change.
This RFC is intended to establish a unified framework. Maybe there will be diffuculty for intergrating hardware framework to common framework, It makes sense to work towards this direction, the diagram below represents a proposed solution:
Taking
Executor
as example, for third-party hardware devices based on thepytorch
ecosystem, the basic interfaces of torch have been well adapted, so after abstracting the device-related hard coding, such astorch.cuda
,torch.xpu
,GPU Executor
could be used as theCommon Executor
of all third-party devices.Following https://github.com/vllm-project/vllm/pull/6080, different hardware backends can put their own device-specific code in
NewBackendPlatform
, so that the framework can be device-agnostic throughcurrent_platform
. For example,torch.cuda.synchronize
could usecurrent_platform.synchronize
.Feedback Period.
To realize this idea will involve more files, so the following steps are currently sorted out to finally achieve the above purpose:
is_cpu
->current_platform.is_cpu
is_xpu
->current_platform.is_xpu
is_openvino
->current_platform.is_openvino
is_neuron
->current_platform.is_neuron
seed_everything
->current_platform.seed_everything
is_pin_memory_available
->current_platform.is_pin_memory_available
DeviceMemoryProfiler
->current_platform.memory_profiler
wrap_device
->current_platform.wrap_device
torch.xxx.get_device_name
->current_platform.get_device
torch.xxx.Event
->current_platform.Event
torch.xxx.synchronize
->current_platform.synchronize
torch.xxx.Stream
->current_platform.Stream
torch.xxx.stream
->current_platform.stream
torch.xxx.empty_cache
->current_platform.empty_cache
torch.xxx.device_count
->current_platform.device_count
torch.xxx.memory_allocated
->current_platform.memroy_allocated
torch.xxx.set_device
->current_paltform.set_device
torch.xxx.current_device
->current_platform.current_device
torch.xxx.get_device_capability
->current_platform.get_device_capability
gpu(neuron,openvino,tpu,xpu,..)_executor
->common_backend_executor
gpu(neuron,openvino,tpu,xpu,..)_worker
->common_backend_worker
gpu(neuron,openvino,tpu,xpu,..)_model_runner
->common_backend_model_runner
There must be omissions or difficulties in actual implementation here, keep updating.
CC List.
@youkaichao @WoosukKwon
Any Other Things.
No response
Before submitting a new issue...