Open 18140663659 opened 1 year ago
It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.
In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.
Thank you for your reply, I would like to know the cropping effects of the following models with different specifications, which are bigscience/bloom-7b1, bigscience/bloom-3b, bigscience/bloom-1b7 in huggingface models
Hi, we have some results on Bloom models, I summarized it here (unstructured 50% sparsity):
BLOOM | 560M | 1.1B | 1.7B | 3B | 7.1B |
---|---|---|---|---|---|
dense | 22.42 | 17.68 | 15.39 | 13.48 | 11.37 |
magnitude | 2e10 | 1e6 | 2e5 | 8e6 | 2e6 |
sparsegpt | 28.92 | 21.35 | 18.88 | 16.76 | 13.96 |
wanda | 30.74 | 22.72 | 19.79 | 16.45 | 13.55 |
It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.
In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.