Open wingvortex opened 2 years ago
KD currently does not support multi-gpu. We adopted KD method from "End-to-end semi-supervised object dection with soft teacher" and generating teacher's feature(I would say it's a guide feature for student) was heavy operation.
The bottom line is KD is already using two GPUs. One for student and one for teacher. You can check out in https://github.com/j-marple-dev/AYolov2/blob/main/distillation.py#L66
FYI, we have not managed to see a beneficial point using KD yet.
Already noticed that both the student and the teacher takes a GPU, and the teacher uses quite a lot of GPU memory. Thanks for your extra information, do you mean the performance gain is limited when applying the semi-supervised object detection? In my case, the labeled data to unlabeled data ratio is 1:2.
Hi, thanks for your sharing. When I tried to use multi-gpu to train Knowledge Distillation:
python3 -m torch.distributed.run --nproc_per_node $N_GPU distillation.py ...
I got the error: torch.distributed.elastic.multiprocessing.errors.ChildFailedError: distillation.py FAILED Failures: