Use quantization/pruning

marksverdhei / bert-bot

Discord bot

GNU General Public License v3.0

5 stars 2 forks source link

Open marksverdhei opened 2 years ago

marksverdhei commented 2 years ago

Reduce neural network size by pruning and quantization for better performance

marksverdhei commented 2 years ago

We can for instance use HuggingFace optimum: https://github.com/huggingface/optimum