ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.03k stars 16.17k forks source link

Knowledge Distillation - Teacher/student training #3665

Closed marvision-ai closed 3 years ago

marvision-ai commented 3 years ago

Motivation

Neural models in recent years have been successful in almost every field including extremely complex problem statements. However, these models are huge in size, with millions (and billions) of parameters, and thus cannot be deployed on edge devices or are very slow.

Goal

To be able to train a yolov5x and distill its knowledge to a yolov5s with minimal accuracy loss to provide huge gains in speed.

What is Knowledge Distillation?

Knowledge distillation refers to the idea of model compression by teaching a smaller network, step by step, exactly what to do using a bigger already trained network. The ‘soft labels’ refer to the output feature maps by the bigger network after every convolution layer. The smaller network is then trained to learn the exact behaviour of the bigger network by trying to replicate it's outputs at every level (not just the final loss).

image

How is this different from training a model from scratch?

Obviously, with more complex models, the theoretical search space in larger than that of a smaller network. However, if we assume that the same (or even similar) convergence can be achieved using a smaller network, then the convergence space of the Teacher Network should overlap with the solution space of the student network.

Unfortunately, that alone does not guarantee converge for the student network at the same location. The student network can have a convergence which might be hugely different from that of the teacher network. However, if the student network is guided to replicate the behavior of the teacher network (which has already searched through a bigger solution space), it is expected to have its convergence space overlapping with the original Teacher Network convergence space.

image

Teacher Student networks — How do they exactly work?

  1. Train the Teacher Network : The highly complex teacher network is first trained separately using the complete dataset. This step requires high computational performance and thus can only be done offline (on high performing GPUs).

  2. Establish Correspondence : While designing a student network, a correspondence needs to be established between intermediate outputs of the student network and the teacher network. This correspondence can involve directly passing the output of a layer in the teacher network to the student network, or performing some data augmentation before passing it to the student network.

image

  1. Forward Pass through the Teacher network : Pass the data through the teacher network to get all intermediate outputs and then apply data augmentation (if any) to the same.

  2. Backpropagation through the Student Network : Now use the outputs from the teacher network and the correspondence relation to backpropagate error in the student network, so that the student network can learn to replicate the behavior of the teacher network.

ref: https://arxiv.org/abs/2004.03281 https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)

glenn-jocher commented 3 years ago

@marvision-ai hey thanks for the post! This is a great idea, but unfortunately we are limited in resources due to maintenance and support requirements for the repository. The best way to see some of these features introduced would be to provide the updates you'd like to see in a Pull Request so we could check out your branch and get started analyzing your changes in a more quantitative way. We've created a contributing guide to get new users started:

Contribute

We love your input! We want to make contributing to YOLOv5 as easy and transparent as possible. Please see our Contributing Guide to get started.

github-actions[bot] commented 3 years ago

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!