vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.67k stars 4.81k forks source link

[Feature]: `JetMoE` support #7771

Open wavy-jung opened 3 months ago

wavy-jung commented 3 months ago

🚀 The feature, motivation and pitch

Hi, I'm doing research for various MoE model architectures. And I found out that JetMoE architecture is promising for following reasons:

  1. Inference efficiency: With only 2.2B active parameters and low training budgets, it surpasses the performance of llama2 model of similar sizes.
  2. Potential of jetmoe: Only a single experiment was conducted in the JetMoE paper so that performance upper bound of this architectured models would be greater.
  3. Unique architecture: Mixture-of-Attentions (MoA) is supported in this model.
  4. Huggingface integrated: HF already supports this architecture!

references:

Alternatives

No response

Additional context

No response

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!