issues
search
princeton-nlp
/
LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
562
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The pruned model does not match target structred config
#74
dat-browny
opened
2 months ago
0
Request for Fine-tuning Data for Continued Pre-training
#73
pupumao
opened
2 months ago
0
About the NQ EM Score in Table 2
#72
chuhac
opened
3 months ago
0
Default Initialization of Lambda Parameters to Zero
#71
lpyhdzx
opened
5 months ago
3
Open source the pruning mask.
#70
Achazwl
closed
5 months ago
2
Support for Llama-3 / GQA?
#69
LorrinWWW
closed
5 months ago
1
Can LLM-Shearing be used on ViT models?
#68
n9s8a
opened
7 months ago
1
about shearing params config
#67
LoverLost
opened
7 months ago
1
Why the rope params are ignored while converting hf checkpoint to composer checkpoint?
#66
ZhiYuanZeng
opened
8 months ago
3
The dtype of tokenized data should be uint32
#65
ZhiYuanZeng
closed
5 months ago
1
composer model trans to pythia problem
#64
rzr002
opened
8 months ago
0
LlamaRMSNorm() layer differs from original llama
#63
suhmily
closed
8 months ago
1
The Project is not implemented for 70B llama?
#62
zhangzhenyu13
opened
8 months ago
7
Start training but only output config information
#61
Beatlesso
opened
9 months ago
3
None
#60
Beatlesso
closed
9 months ago
0
有没有不用Slurm跑剪枝的方法?
#59
Beatlesso
closed
9 months ago
0
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
#58
rzr002
closed
8 months ago
5
Instruction tuning dataset
#57
kiucho
closed
8 months ago
2
save model meet problem
#56
18140663659
opened
9 months ago
1
Pruning fine-tuned model
#55
kiucho
closed
8 months ago
2
TypeError: buffer is too small for requested array
#54
18140663659
opened
9 months ago
0
Start training but nothing continue
#53
logan-zou
closed
9 months ago
6
missmatch shape
#52
coderchem
closed
10 months ago
0
Could you provide tokenized continue-pretraining dataset for reproduction?
#51
gywlssww
opened
10 months ago
3
When should we apply hidden_z?
#50
sbwww
closed
8 months ago
2
KeyError: 'state'
#49
changheecho
opened
10 months ago
2
Error running CheckpointSaver.close(). Skipping CheckpointSaver.post_close()
#48
rzr002
closed
10 months ago
1
Avoid OOM using deepspeed zero-stage
#47
gywlssww
opened
10 months ago
3
在进行Building trainer时,训练会卡住;
#46
coderchem
opened
10 months ago
1
duplicate mean values during mask initialization
#45
czhang99
closed
10 months ago
2
Release sheared model without re-training?
#44
sbwww
closed
10 months ago
4
model.prune_params() NotImplementedError: Could not run 'aten::nonzero'
#43
YanxiZSQ
opened
11 months ago
3
The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper
#42
YWMditto
opened
11 months ago
1
Metric Scores and NQ Evaluation
#41
Spico197
closed
10 months ago
2
Missing index.json in dataset shared on drive
#40
AnonNoNameAccount
closed
11 months ago
1
Drive dress error
#39
YanxiZSQ
closed
11 months ago
2
cannot reshape array of size 4 into shape (1,newaxis,8)
#38
rzr002
closed
11 months ago
5
meta-llama/Llama-2-7b-hf Model Preparation failed
#37
rzr002
closed
11 months ago
1
wiki proportion finally dominates at the end of the pruning stage
#36
lippman1125
closed
11 months ago
6
ShearedCodeLLama
#35
SinanAkkoyun
closed
11 months ago
3
LanguageCrossEntropy logs nan when bash pruning.sh
#34
YanxiZSQ
opened
11 months ago
6
AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'
#33
YanxiZSQ
closed
11 months ago
1
Pruning crash at iteration 592.
#32
lippman1125
opened
11 months ago
6
Train metrics/train/github_LanguageCrossEntropy: nan
#31
lippman1125
closed
11 months ago
2
Create cleanshm.sh
#30
Longyichen
closed
11 months ago
0
KV head count on princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT ?
#29
SinanAkkoyun
closed
11 months ago
2
Docker Request
#28
TonyZhanghm
closed
10 months ago
1
Flash-attn dependency issues
#27
Forival
closed
11 months ago
1
Please share the alpaca generate and eval code and script to reproduce the results shared in
#26
sanyalsunny111
closed
1 year ago
4
Finetuning using LoRA
#25
Nimisha-Pabbichetty
closed
10 months ago
5
Next