The quantization of the compressed models

horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

https://arxiv.org/abs/2305.11627

Apache License 2.0

880 stars 106 forks source link

The quantization of the compressed models #49

Open lihuang258 opened 10 months ago

lihuang258 commented 10 months ago

If I want to further quantize the pruned model, how should I proceed? I saw this mentioned in the paper