[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache License 2.0

30.43k stars 4.6k forks source link

🚀 The feature, motivation and pitch

I have fine-tuned multiple LoRA models to act as expert layers within an MOE architecture. How can I leverage a VLLM to accelerate this? Currently, VLLM acceleration, such as LLM(model_path), only supports model paths. However, my MOE constitutes an architecture itself. How can I adapt VLLM, for instance, to support loading a model object with LLM(model_obj), where the object type could be PreTrainedModel?

Alternatives

Train a custom MOE model and configure it as a new model type, custom model within VLLM, enabling acceleration through parameter settings.

Additional context

base model is llama/bloomz/qwen ，etc.

vllm-project / vllm