Closed ugm2 closed 3 years ago
Hi @ugm2. Thanks for your interest.
Knowledge distillation requires a new training. (e.g. you can fine tune again with general distilled model - distilbert, distilroberta, tinybert and etc.). Depending on your hardware budget, you can skip this, but this gives a quite good improvement.
You can use structured pruning. https://github.com/microsoft/fastformers#pruning-models
Knowledge distillation from your fine-tuned model to the pruned model. https://github.com/microsoft/fastformers#distilling-models
8-bit quantization. https://github.com/microsoft/fastformers#optimizing-models-on-cpu-8-bit-integer-quantization--onnxruntime
Then, you can run and benchmark. https://github.com/microsoft/fastformers#evaluating-models
4 doesn't require any training.
Does this make?
Yes, it does! The problem is that I need to specify a task_name and a model_type even for 8-bit quantization, and neither my task (question-answer pairs generation) nor the model type (Google's T5) are available.
Thanks for the comment. The main purpose of this repository is to demonstrate the models from the FastFormers paper. At the moment, we are not planning to expand the scope.
How to optimize an already fine-tuned model from Hugging Face?
Congratulations on the work, it looks amazing 😊
Details
If there is an already fine-tuned model from Hugging Face for, let's say, generating question-answer pairs such as valhalla/t5-base-qa-qg-hl, how could it be further optimized for inference using your method? I'm a bit lost
Thank you in advance!