Does knowledge distillation support multi-gpu?

j-marple-dev / AYolov2

GNU General Public License v3.0

156 stars 15 forks source link

Does knowledge distillation support multi-gpu? #56

Open wingvortex opened 2 years ago

wingvortex commented 2 years ago

Hi, thanks for your sharing. When I tried to use multi-gpu to train Knowledge Distillation: python3 -m torch.distributed.run --nproc_per_node $N_GPU distillation.py ... I got the error: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: distillation.py FAILED Failures:

JeiKeiLim commented 2 years ago

KD currently does not support multi-gpu. We adopted KD method from "End-to-end semi-supervised object dection with soft teacher" and generating teacher's feature(I would say it's a guide feature for student) was heavy operation.

The bottom line is KD is already using two GPUs. One for student and one for teacher. You can check out in https://github.com/j-marple-dev/AYolov2/blob/main/distillation.py#L66

FYI, we have not managed to see a beneficial point using KD yet.

wingvortex commented 2 years ago

Already noticed that both the student and the teacher takes a GPU, and the teacher uses quite a lot of GPU memory. Thanks for your extra information, do you mean the performance gain is limited when applying the semi-supervised object detection? In my case, the labeled data to unlabeled data ratio is 1:2.