princeton-nlp LLM-Shearing issues

princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

https://arxiv.org/abs/2310.06694

MIT License

562 stars 47 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

The pruned model does not match target structred config

#74 dat-browny opened 2 months ago
0
Request for Fine-tuning Data for Continued Pre-training

#73 pupumao opened 2 months ago
0
About the NQ EM Score in Table 2

#72 chuhac opened 3 months ago
0
Default Initialization of Lambda Parameters to Zero

#71 lpyhdzx opened 5 months ago
3
Open source the pruning mask.

#70 Achazwl closed 5 months ago
2
Support for Llama-3 / GQA?

#69 LorrinWWW closed 5 months ago
1
Can LLM-Shearing be used on ViT models?

#68 n9s8a opened 7 months ago
1
about shearing params config

#67 LoverLost opened 7 months ago
1
Why the rope params are ignored while converting hf checkpoint to composer checkpoint?

#66 ZhiYuanZeng opened 8 months ago
3
The dtype of tokenized data should be uint32

#65 ZhiYuanZeng closed 5 months ago
1
composer model trans to pythia problem

#64 rzr002 opened 8 months ago
0
LlamaRMSNorm() layer differs from original llama

#63 suhmily closed 8 months ago
1
The Project is not implemented for 70B llama?

#62 zhangzhenyu13 opened 8 months ago
7
Start training but only output config information

#61 Beatlesso opened 9 months ago
3
None

#60 Beatlesso closed 9 months ago
0
有没有不用Slurm跑剪枝的方法？

#59 Beatlesso closed 9 months ago
0
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?

#58 rzr002 closed 8 months ago
5
Instruction tuning dataset

#57 kiucho closed 8 months ago
2
save model meet problem

#56 18140663659 opened 9 months ago
1
Pruning fine-tuned model

#55 kiucho closed 8 months ago
2
TypeError: buffer is too small for requested array

#54 18140663659 opened 9 months ago
0
Start training but nothing continue

#53 logan-zou closed 9 months ago
6
missmatch shape

#52 coderchem closed 10 months ago
0
Could you provide tokenized continue-pretraining dataset for reproduction?

#51 gywlssww opened 10 months ago
3
When should we apply hidden_z?

#50 sbwww closed 8 months ago
2
KeyError: 'state'

#49 changheecho opened 10 months ago
2
Error running CheckpointSaver.close(). Skipping CheckpointSaver.post_close()

#48 rzr002 closed 10 months ago
1
Avoid OOM using deepspeed zero-stage

#47 gywlssww opened 10 months ago
3
在进行Building trainer时，训练会卡住；

#46 coderchem opened 10 months ago
1
duplicate mean values during mask initialization

#45 czhang99 closed 10 months ago
2
Release sheared model without re-training?

#44 sbwww closed 10 months ago
4
model.prune_params() NotImplementedError: Could not run 'aten::nonzero'

#43 YanxiZSQ opened 11 months ago
3
The implementation of dynamic batch loading code seems inconsistent with the pseudo-code in the paper

#42 YWMditto opened 11 months ago
1
Metric Scores and NQ Evaluation

#41 Spico197 closed 10 months ago
2
Missing index.json in dataset shared on drive

#40 AnonNoNameAccount closed 11 months ago
1
Drive dress error

#39 YanxiZSQ closed 11 months ago
2
cannot reshape array of size 4 into shape (1,newaxis,8)

#38 rzr002 closed 11 months ago
5
meta-llama/Llama-2-7b-hf Model Preparation failed

#37 rzr002 closed 11 months ago
1
wiki proportion finally dominates at the end of the pruning stage

#36 lippman1125 closed 11 months ago
6
ShearedCodeLLama

#35 SinanAkkoyun closed 11 months ago
3
LanguageCrossEntropy logs nan when bash pruning.sh

#34 YanxiZSQ opened 11 months ago
6
AttributeError: module 'flash_attn.flash_attn_interface' has no attribute 'flash_attn_unpadded_func'

#33 YanxiZSQ closed 11 months ago
1
Pruning crash at iteration 592.

#32 lippman1125 opened 11 months ago
6
Train metrics/train/github_LanguageCrossEntropy: nan

#31 lippman1125 closed 11 months ago
2
Create cleanshm.sh

#30 Longyichen closed 11 months ago
0
KV head count on princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT ?

#29 SinanAkkoyun closed 11 months ago
2
Docker Request

#28 TonyZhanghm closed 10 months ago
1
Flash-attn dependency issues

#27 Forival closed 11 months ago
1
Please share the alpaca generate and eval code and script to reproduce the results shared in

#26 sanyalsunny111 closed 1 year ago
4
Finetuning using LoRA

#25 Nimisha-Pabbichetty closed 10 months ago
5