kamanphoebe / Look-into-MoEs

A Closer Look into Mixture-of-Experts in Large Language Models
https://arxiv.org/abs/2406.18219
MIT License
38 stars 1 forks source link

A Closer Look into MoEs in LLMs

This repository contains the code of A Closer Look into Mixture-of-Experts in Large Language Models.

Overview :eyes:

We make an initial attempt to understand the inner workings of MoE-based large language models. Concretely, we comprehensively study the parametric and behavioral features of three recent MoE-based models (Mixtral 8x7B, DeepSeekMoE, Grok-1) and reveal some intriguing observations, including:

Based on the observations, we also provide suggestions for a broad spectrum of MoE practitioners, such as router design and expert allocation. Check out our paper for more inspiring observations and suggestions!

Setup :wrench:

  1. Download the model checkpoints \ By default, our code loads the pre-downloaded models from the ckpt directory. You can also modify it to directly download from HuggingFace. The download links of the models we used are listed below:

  2. Create the conda environment

    git clone https://github.com/kamanphoebe/Look-into-MoEs.git
    cd Look-into-MoEs
    conda create -n analyze --file env.txt

    After creating the conda enviroment, you have to select it as the Jupyter kernel.

Usage :memo:

The two Jupyter notebooks static_analysis.ipynb and dynamic_analysis.ipynb contains the code of experiments about the static parameters and dynamic behaviours, respectively. You can simply run the corresponding code blocks for each experiment, which is titled the same as in the paper. Note that some experiments employ part of the Wikitext 103 test set, which we have already provided in the wikitext103_text.csv.

Citation :star2:

Please cite our work if you find it useful!

@article{lo2024closer,
  title={A Closer Look into Mixture-of-Experts in Large Language Models},
  author={Lo, Ka Man and Huang, Zeyu and Qiu, Zihan and Wang, Zili and Fu, Jie},
  journal={arXiv preprint arXiv:2406.18219},
  year={2024}
}

Acknowledgement :tada:

Our configuration and modeling files of the models are modified based on the corresponding HuggingFace repositories as listed in the Setup section. Thanks for the authors' great work!