vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.33k stars 4.59k forks source link

[Usage]: Is fused_moe/fused_moe.py only support num_expert= 8 and 16? #4433

Open LitLeo opened 6 months ago

LitLeo commented 6 months ago

Your current environment

version:V0.4.1 

How would you like to use vllm

I want to run inference of a MoE model with num_expert=64. I don't know how to integrate it with vllm.

https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/fused_moe/configs In this directory, I only see configuration files that support E=8 and 16. Can we support other sizes, such as 32 and 64?

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!