tarrade / proj_multilingual_text_classification

Explore multilingal text classification using embedding, bert and deep learning architecture
Apache License 2.0
4 stars 1 forks source link

How to create your own distilled model? #50

Closed vluechinger closed 4 years ago

vluechinger commented 4 years ago

As BERT and language models in general are rather huge, it is worth thinking about smaller versions, especially when it comes to deployment. This heavily depends on the later use case where the models are applied. Whenever possible, larger models should give a better result whilst using up more resources.

The basic functionality of distilled models follows a teacher-student architecture where attention heads are removed (detailed knowledge should be acquired).

Open questions:

tarrade commented 4 years ago

out of scope but good idea to reduce the size of the model