pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
https://arxiv.org/abs/2406.16554
Apache License 2.0
879 stars 46 forks source link

Questions about capacity_factor, score_scale_factor #52

Closed theblackcat102 closed 10 months ago

theblackcat102 commented 10 months ago

Hi I got a questions regarding of these params found in config.json : score_scale_factor, capacity_factor

Base on my understanding, the llama-3B-MoE splits the intermediate dimension into 8 parts with each 1376 dimension instead of the original 11008 dimensions. Hence the 1376 in size_experts list. However I am wasn't quite understand the use of capacity_factor and score_scale_factor affects the architecture of MoE. Is this needed during inference or the 1376 numbers are derived based on capacity factor?

I have read expert construction readme but found no connections to setting these 2 values.

Am I missing something here?

Spico197 commented 10 months ago

Hi there, thanks for using LLaMA-MoE~

theblackcat102 commented 10 months ago

@Spico197 understood, thanks for the clarification.