how to approach model distillation, for creating a smaller + faster model - Githubissues

microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.

MIT License

2.01k stars 231 forks source link

how to approach model distillation, for creating a smaller + faster model #148

Open mllife opened 9 months ago

mllife commented 9 months ago

I am interested a implementation of model knowledge distillation for this specific model. This technique will allows us to transfer the valuable knowledge and performance of a larger, resource-intensive model (the "teacher") to a smaller, more lightweight counterpart (the "student").

Any inputs from the community on this will be really helpful. How should I approach this problem?

PS- I got this idea from PaddleStructure v2, where they used FGD [Focal and Global Knowledge Distillation for Detectors] - for model size reduction ; source; https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/ppstructure/docs/models_list_en.md