vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.43k stars 4.6k forks source link

[Feature]: How to Enable VLLM to Work with PreTrainedModel Objects in my MOE-LoRA? THX #5128

Open zhaofangtao opened 5 months ago

zhaofangtao commented 5 months ago

🚀 The feature, motivation and pitch

I have fine-tuned multiple LoRA models to act as expert layers within an MOE architecture. How can I leverage a VLLM to accelerate this? Currently, VLLM acceleration, such as LLM(model_path), only supports model paths. However, my MOE constitutes an architecture itself. How can I adapt VLLM, for instance, to support loading a model object with LLM(model_obj), where the object type could be PreTrainedModel?

Alternatives

Train a custom MOE model and configure it as a new model type, custom model within VLLM, enabling acceleration through parameter settings.

Additional context

base model is llama/bloomz/qwen ,etc.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!