Open 18140663659 opened 1 year ago
@horseee Has this been supported yet? can we prune the bloom-series model now? also I hava a question: how can users automaticly to set the params like block_mlp_layer_start and block_mlp_layer_end for each model?
Hi. I uploaded the code for pruning BLOOM. You can find the instructions for pruning the BLOOM here
I only conducted a quick test on BLOOM-3B to make sure it properly works. As for the hyper-parameters, like the best layers to prune (e.g., block_attention_layer_start, block_attention_layer_end), or the prompts for pruning, I haven't searched for the optimal ones. If you found some bugs in other BLOOM models, or find some better hyper-parameters, please feel free to contact me.
Thanks a lot! Eager to try it on Bloom... Do you think it may work on the 176b bloomz?
Another question: your discord invite link does not work anymore, do you have any active discord?
Hi. From the perspective of the algorithm, it is entirely feasible to use this on Bloom 176B. However, the current algorithm requires gradient computation, and recording these gradients for a 176B model is impractical for both CPUs and GPUs. If you are considering pruning, you might start with L2 or random pruning, although the results may be somewhat less effective.
We will update the discord link soon. Appreciate your gentle reminder!
Hi.
LLM-Pruner is a general structural pruning method for LLM and it can also be used on the pruning of BLOOM.
However, due to the increasing number of LLMs in the community, we have not yet conducted experiments to assess the performance of our method on BLOOM. Recognizing the growing demand for pruning in BLOOM, we have prioritized it on our to-do list and will promptly release the code and results pertaining to BLOOM.