issues
search
princeton-nlp
/
LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
498
stars
38
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Default Initialization of Lambda Parameters to Zero
#71
lpyhdzx
opened
1 month ago
3
Open source the pruning mask.
#70
Achazwl
closed
1 month ago
2
Support for Llama-3 / GQA?
#69
LorrinWWW
closed
1 month ago
1
Can LLM-Shearing be used on ViT models?
#68
n9s8a
opened
3 months ago
1
about shearing params config
#67
LoverLost
opened
3 months ago
1
Why the rope params are ignored while converting hf checkpoint to composer checkpoint?
#66
ZhiYuanZeng
opened
4 months ago
3
The dtype of tokenized data should be uint32
#65
ZhiYuanZeng
closed
1 month ago
1
composer model trans to pythia problem
#64
rzr002
opened
4 months ago
0
LlamaRMSNorm() layer differs from original llama
#63
suhmily
closed
4 months ago
1
The Project is not implemented for 70B llama?
#62
zhangzhenyu13
opened
4 months ago
7
Start training but only output config information
#61
Beatlesso
opened
4 months ago
3
None
#60
Beatlesso
closed
4 months ago
0
有没有不用Slurm跑剪枝的方法?
#59
Beatlesso
closed
4 months ago
0
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
#58
rzr002
closed
4 months ago
5
Instruction tuning dataset
#57
kiucho
closed
4 months ago
2
save model meet problem
#56
18140663659
opened
5 months ago
1
Pruning fine-tuned model
#55
kiucho
closed
4 months ago
2
TypeError: buffer is too small for requested array
#54
18140663659
opened
5 months ago
0
Start training but nothing continue
#53
logan-zou
closed
5 months ago
6
missmatch shape
#52
coderchem
closed
5 months ago
0
Could you provide tokenized continue-pretraining dataset for reproduction?
#51
gywlssww
opened
6 months ago
3
When should we apply hidden_z?
#50
sbwww
closed
4 months ago
2
KeyError: 'state'
#49
changheecho
opened
6 months ago
2
Error running CheckpointSaver.close(). Skipping CheckpointSaver.post_close()
#48
rzr002
closed
6 months ago
1
Avoid OOM using deepspeed zero-stage
#47
gywlssww
opened
6 months ago
3
在进行Building trainer时,训练会卡住;
#46
coderchem
opened
6 months ago
1
duplicate mean values during mask initialization
#45
czhang99
closed
6 months ago
2
Release sheared model without re-training?
#44
sbwww
closed
6 months ago
4
model.prune_params() NotImplementedError: Could not run 'aten::nonzero'
#43
YanxiZSQ
opened
7 months ago
3
The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper
#42
YWMditto
opened
7 months ago
1
Metric Scores and NQ Evaluation
#41
Spico197
closed
6 months ago
2
Missing index.json in dataset shared on drive
#40
AnonNoNameAccount
closed
7 months ago
1
Drive dress error
#39
YanxiZSQ
closed
7 months ago
2
cannot reshape array of size 4 into shape (1,newaxis,8)
#38
rzr002
closed
7 months ago
5
meta-llama/Llama-2-7b-hf Model Preparation failed
#37
rzr002
closed
7 months ago
1
wiki proportion finally dominates at the end of the pruning stage
#36
lippman1125
closed
7 months ago
6
ShearedCodeLLama
#35
SinanAkkoyun
closed
7 months ago
3
LanguageCrossEntropy logs nan when bash pruning.sh
#34
YanxiZSQ
opened
7 months ago
6
AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'
#33
YanxiZSQ
closed
7 months ago
1
Pruning crash at iteration 592.
#32
lippman1125
opened
7 months ago
6
Train metrics/train/github_LanguageCrossEntropy: nan
#31
lippman1125
closed
7 months ago
2
Create cleanshm.sh
#30
Longyichen
closed
7 months ago
0
KV head count on princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT ?
#29
SinanAkkoyun
closed
7 months ago
2
Docker Request
#28
TonyZhanghm
closed
6 months ago
1
Flash-attn dependency issues
#27
Forival
closed
7 months ago
1
Please share the alpaca generate and eval code and script to reproduce the results shared in
#26
sanyalsunny111
closed
8 months ago
4
Finetuning using LoRA
#25
Nimisha-Pabbichetty
closed
6 months ago
5
Path no use in continue_pretrain.sh
#24
Longyichen
closed
7 months ago
9
NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet
#23
Longyichen
closed
8 months ago
3
How much compute will this take?
#22
fakerybakery
closed
6 months ago
7
Next