Open ByronHsu opened 1 month ago
Just curious, are the following imports in model_runner.py also being considered for removal, in later stages
from vllm.config import DeviceConfig, LoadConfig
from vllm.config import ModelConfig as VllmModelConfig
from vllm.distributed import (
get_tp_group,
init_distributed_environment,
initialize_model_parallel,
set_custom_all_reduce,
)
from vllm.distributed.parallel_state import in_the_same_node_as
from vllm.model_executor.model_loader import get_model
from vllm.model_executor.models import ModelRegistry
UPDATE(11/23/2024)
Currently, @james-p-xu is removing rope, @yizhang2077 is removing distributed, @HandH1998 is removing weight loader. Optimistically, we can remove these dependencies by the end of the month and make quant optional (try import). cc @merrymercy @Ying1123
Motivation
This is a tracker of removing vLLM dependencies in general model code (not considering quantization). This is our current import from vLLM, and we want to remove all them.
Tracker
CacheConfig
: https://github.com/sgl-project/sglang/pull/1658get_tensor_model_parallel_world_size
ParallelLMHead
: https://github.com/sgl-project/sglang/pull/1856VocabParallelEmbedding
: https://github.com/sgl-project/sglang/pull/1856