Open zhaofangtao opened 5 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
🚀 The feature, motivation and pitch
I have fine-tuned multiple LoRA models to act as expert layers within an MOE architecture. How can I leverage a VLLM to accelerate this? Currently, VLLM acceleration, such as LLM(model_path), only supports model paths. However, my MOE constitutes an architecture itself. How can I adapt VLLM, for instance, to support loading a model object with LLM(model_obj), where the object type could be PreTrainedModel?
Alternatives
Train a custom MOE model and configure it as a new model type, custom model within VLLM, enabling acceleration through parameter settings.
Additional context
base model is llama/bloomz/qwen ,etc.