issues
search
pjlab-sys4nlp
/
llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
https://arxiv.org/abs/2406.16554
Apache License 2.0
883
stars
46
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Code on Expert Specialization Experiments
#71
Tangkexian
closed
3 weeks ago
2
Can this be used as a means to speed up LLM inferencing?
#70
bulaikexiansheng
closed
3 weeks ago
2
Any experiments about the load balancing loss?
#69
exhyy
closed
1 month ago
3
Some questions on scripts and runtime
#68
kevin3567
opened
2 months ago
1
per_device_train_batch_size=1,but almost all of my GPU memory is still being used up?
#67
rzr002
closed
8 months ago
6
Some weights of LlamaMoEForCausalLM were not initialized
#66
Minami-su
closed
8 months ago
5
please update modeling_llama_moe_hf.py
#65
Minami-su
closed
8 months ago
5
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
#64
rzr002
closed
8 months ago
1
SFT: add sft contents
#63
Spico197
closed
8 months ago
0
Partition FFNs without downsizing them?
#62
abhinand5
closed
8 months ago
1
[Major] HF Code Cleaning
#61
DaizeDong
closed
8 months ago
0
我们才能从llama13b开始训练moe呢?
#60
xyjsjruiliu
closed
9 months ago
1
Update README.md
#59
DaizeDong
closed
9 months ago
0
About dataset prepare
#58
bestfleer
closed
8 months ago
1
Can you report the running time on hardware?
#57
qiuzh20
opened
10 months ago
0
How to split "down" by "up" when using clustering to construct experts? 请问使用clustering进行Expert Construction时,down怎么根据up划分?
#56
Attention-is-All-I-Need
closed
8 months ago
4
How many llama models are used for constructing llama-moe ? moe的构建是通过多个llama模型还是1个llama模型
#55
ZeyuTeng96
opened
10 months ago
7
./scripts/expert_construction/split/run_split_random.sh: 行 18: srun: 未找到命令
#54
18600709862
closed
8 months ago
4
about cosine lr scheduler
#53
ftgreat
closed
10 months ago
2
Questions about capacity_factor, score_scale_factor
#52
theblackcat102
closed
10 months ago
2
#Feature Request# Accelerated Deployment.
#51
Xingxiangrui
opened
10 months ago
2
PUBLISH: update citation info
#50
Spico197
closed
10 months ago
0
Performance comparison between LLama-MOE and the original dense model.
#49
DoubleVII
closed
10 months ago
2
About Chinese performances. 关于中文能力的询问
#48
WangRongsheng
closed
10 months ago
2
Why a new trainer instead of the original one? 请教一下为什么要新写一个llama_lr_scheduling_trainer,它的作用是什么,为什么不用原始trainer
#47
linyubupa
closed
10 months ago
1
fix typo
#46
Spico197
closed
11 months ago
0
PUBLISH: upload technical report
#45
Spico197
closed
11 months ago
0
Moefication: Format Standardization (v8)
#44
DaizeDong
closed
10 months ago
0
PUBLISH: filename refactors and readme preparation
#43
Spico197
closed
11 months ago
0
Update gate load vis, update readme
#42
Spico197
closed
11 months ago
1
Moefication: README Update
#41
DaizeDong
closed
11 months ago
0
Moefication: Aggregation Before Release [pre-commit]
#40
DaizeDong
closed
11 months ago
0
CPT: add more args and exec scripts
#39
Spico197
closed
12 months ago
1
CPT: add dynamic batch loading in sheared llama
#38
Spico197
closed
1 year ago
1
CPT: add meta info when tokenization
#37
Spico197
closed
1 year ago
1
Moefication: Residual Gate Update [pre-commit]
#36
DaizeDong
closed
1 year ago
0
CPT: add eval support
#35
Spico197
closed
1 year ago
0
Add Residual CPT Pipeline
#34
DaizeDong
closed
1 year ago
0
Moefication: Residual MoE Config Update [pre-commit]
#33
DaizeDong
closed
1 year ago
0
Merge from Main
#32
DaizeDong
closed
1 year ago
0
CPT: fix tb logging, fix grad ckpting, faster data loading
#31
Spico197
closed
1 year ago
0
Moefication: Format Standardization (v4 v5) & Major Method Update
#30
DaizeDong
closed
1 year ago
1
Moefication: Format Standardization (v4) & Residual MoE Update
#29
DaizeDong
closed
1 year ago
0
Merge from main
#28
DaizeDong
closed
1 year ago
0
CPT: update `save_optim_limit`, update 13B scripts
#27
Spico197
closed
1 year ago
0
Moefication: Gradient split analysis
#26
DaizeDong
closed
1 year ago
0
add max_tokens and lr_scheduler resuming
#25
tongjingqi
closed
1 year ago
0
Data clustering: add tokenization for clustered data, fix training & eval bugs in `moe_gates.py`
#24
Spico197
closed
1 year ago
0
Moefication: Switch Transformers Implementation
#23
DaizeDong
closed
1 year ago
0
Moefication: Gradient split (2/2) & MoE gate re-initialization update
#22
DaizeDong
closed
1 year ago
1
Next