pjlab-sys4nlp llama-moe issues

pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

https://arxiv.org/abs/2406.16554

Apache License 2.0

883 stars 46 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Code on Expert Specialization Experiments

#71 Tangkexian closed 3 weeks ago
2
Can this be used as a means to speed up LLM inferencing?

#70 bulaikexiansheng closed 3 weeks ago
2
Any experiments about the load balancing loss?

#69 exhyy closed 1 month ago
3
Some questions on scripts and runtime

#68 kevin3567 opened 2 months ago
1
per_device_train_batch_size=1，but almost all of my GPU memory is still being used up?

#67 rzr002 closed 8 months ago
6
Some weights of LlamaMoEForCausalLM were not initialized

#66 Minami-su closed 8 months ago
5
please update modeling_llama_moe_hf.py

#65 Minami-su closed 8 months ago
5
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?

#64 rzr002 closed 8 months ago
1
SFT: add sft contents

#63 Spico197 closed 8 months ago
0
Partition FFNs without downsizing them?

#62 abhinand5 closed 8 months ago
1
[Major] HF Code Cleaning

#61 DaizeDong closed 8 months ago
0
我们才能从llama13b开始训练moe呢？

#60 xyjsjruiliu closed 9 months ago
1
Update README.md

#59 DaizeDong closed 9 months ago
0
About dataset prepare

#58 bestfleer closed 8 months ago
1
Can you report the running time on hardware?

#57 qiuzh20 opened 10 months ago
0
How to split "down" by "up" when using clustering to construct experts? 请问使用clustering进行Expert Construction时，down怎么根据up划分？

#56 Attention-is-All-I-Need closed 8 months ago
4
How many llama models are used for constructing llama-moe ? moe的构建是通过多个llama模型还是1个llama模型

#55 ZeyuTeng96 opened 10 months ago
7
./scripts/expert_construction/split/run_split_random.sh: 行 18: srun: 未找到命令

#54 18600709862 closed 8 months ago
4
about cosine lr scheduler

#53 ftgreat closed 10 months ago
2
Questions about capacity_factor, score_scale_factor

#52 theblackcat102 closed 10 months ago
2
#Feature Request# Accelerated Deployment.

#51 Xingxiangrui opened 10 months ago
2
PUBLISH: update citation info

#50 Spico197 closed 10 months ago
0
Performance comparison between LLama-MOE and the original dense model.

#49 DoubleVII closed 10 months ago
2
About Chinese performances. 关于中文能力的询问

#48 WangRongsheng closed 10 months ago
2
Why a new trainer instead of the original one? 请教一下为什么要新写一个llama_lr_scheduling_trainer，它的作用是什么，为什么不用原始trainer

#47 linyubupa closed 10 months ago
1
fix typo

#46 Spico197 closed 11 months ago
0
PUBLISH: upload technical report

#45 Spico197 closed 11 months ago
0
Moefication: Format Standardization (v8)

#44 DaizeDong closed 10 months ago
0
PUBLISH: filename refactors and readme preparation

#43 Spico197 closed 11 months ago
0
Update gate load vis, update readme

#42 Spico197 closed 11 months ago
1
Moefication: README Update

#41 DaizeDong closed 11 months ago
0
Moefication: Aggregation Before Release [pre-commit]

#40 DaizeDong closed 11 months ago
0
CPT: add more args and exec scripts

#39 Spico197 closed 12 months ago
1
CPT: add dynamic batch loading in sheared llama

#38 Spico197 closed 1 year ago
1
CPT: add meta info when tokenization

#37 Spico197 closed 1 year ago
1
Moefication: Residual Gate Update [pre-commit]

#36 DaizeDong closed 1 year ago
0
CPT: add eval support

#35 Spico197 closed 1 year ago
0
Add Residual CPT Pipeline

#34 DaizeDong closed 1 year ago
0
Moefication: Residual MoE Config Update [pre-commit]

#33 DaizeDong closed 1 year ago
0
Merge from Main

#32 DaizeDong closed 1 year ago
0
CPT: fix tb logging, fix grad ckpting, faster data loading

#31 Spico197 closed 1 year ago
0
Moefication: Format Standardization (v4 v5) & Major Method Update

#30 DaizeDong closed 1 year ago
1
Moefication: Format Standardization (v4) & Residual MoE Update

#29 DaizeDong closed 1 year ago
0
Merge from main

#28 DaizeDong closed 1 year ago
0
CPT: update `save_optim_limit`, update 13B scripts

#27 Spico197 closed 1 year ago
0
Moefication: Gradient split analysis

#26 DaizeDong closed 1 year ago
0
add max_tokens and lr_scheduler resuming

#25 tongjingqi closed 1 year ago
0
Data clustering: add tokenization for clustered data, fix training & eval bugs in `moe_gates.py`

#24 Spico197 closed 1 year ago
0
Moefication: Switch Transformers Implementation

#23 DaizeDong closed 1 year ago
0
Moefication: Gradient split (2/2) & MoE gate re-initialization update

#22 DaizeDong closed 1 year ago
1