microsoft / fastformers

FastFormers - highly efficient transformer models for NLU
Other
700 stars 54 forks source link

Optimize fine-tuned model from HuggingFace #8

Closed ugm2 closed 3 years ago

ugm2 commented 3 years ago

How to optimize an already fine-tuned model from Hugging Face?

Congratulations on the work, it looks amazing 😊

Details

If there is an already fine-tuned model from Hugging Face for, let's say, generating question-answer pairs such as valhalla/t5-base-qa-qg-hl, how could it be further optimized for inference using your method? I'm a bit lost

Thank you in advance!

ykim362 commented 3 years ago

Hi @ugm2. Thanks for your interest.

  1. Knowledge distillation requires a new training. (e.g. you can fine tune again with general distilled model - distilbert, distilroberta, tinybert and etc.). Depending on your hardware budget, you can skip this, but this gives a quite good improvement.

  2. You can use structured pruning. https://github.com/microsoft/fastformers#pruning-models

  3. Knowledge distillation from your fine-tuned model to the pruned model. https://github.com/microsoft/fastformers#distilling-models

  4. 8-bit quantization. https://github.com/microsoft/fastformers#optimizing-models-on-cpu-8-bit-integer-quantization--onnxruntime

Then, you can run and benchmark. https://github.com/microsoft/fastformers#evaluating-models

4 doesn't require any training.

Does this make?

ugm2 commented 3 years ago

Yes, it does! The problem is that I need to specify a task_name and a model_type even for 8-bit quantization, and neither my task (question-answer pairs generation) nor the model type (Google's T5) are available.

ykim362 commented 3 years ago

Thanks for the comment. The main purpose of this repository is to demonstrate the models from the FastFormers paper. At the moment, we are not planning to expand the scope.