Open marksverdhei opened 2 years ago
Reduce neural network size by pruning and quantization for better performance
We can for instance use HuggingFace optimum: https://github.com/huggingface/optimum
Reduce neural network size by pruning and quantization for better performance