this method can be used to bloom model?

Eric-mingjie commented 1 year ago

It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.

In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.

18140663659 commented 1 year ago

It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.

In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.

Thank you for your reply, I would like to know the cropping effects of the following models with different specifications, which are bigscience/bloom-7b1, bigscience/bloom-3b, bigscience/bloom-1b7 in huggingface models

Eric-mingjie commented 1 year ago

Hi, we have some results on Bloom models, I summarized it here (unstructured 50% sparsity):

BLOOM	560M	1.1B	1.7B	3B	7.1B
dense	22.42	17.68	15.39	13.48	11.37
magnitude	2e10	1e6	2e5	8e6	2e6
sparsegpt	28.92	21.35	18.88	16.76	13.96
wanda	30.74	22.72	19.79	16.45	13.55

locuslab / wanda

this method can be used to bloom model? #7