Closed duchieuphan2k1 closed 1 year ago
To use distillation in your data, you need to train a teacher model on your data first, then use the teacher model to supervise the training of student model.
Firstly, chose a bigger model as the teacher, e.g. damoyolo-m, and train it from scratch or finetune it from coco pretrained weights on your data.
Secondly, use the pretrained teacher model to conduct a distillation to your target(student) model, e.g. damoyolo-s. For detailed usage of distillation, please refer to ./scripts/coco_distill.sh.
I have used the biggest model yet (DAMO-YOLO-M). So how could i select the teacher for this model, or the distillation technique cannot apply for the biggest model?
You can use DAMO-YOLO-M as a teacher model to perform self-distillation on itself, via
python -m torch.distributed.launch --nproc_per_node=8 tools/train.py -f configs/damoyolo_tinynasL35_M.py --tea_config configs/damoyolo_tinynasL35_M_tea.py --tea_ckpt ../damoyolo_tinynasL35_M.pth
We believe this distillation method is effective for biggest model. However, as DAMO-YOLO-M is currently the largest model we offer, we can only conduct self-distillation on it. We plan to introduce larger models in the future, which can be used for distilling DAMO-YOLO-M at that time.
Could i clone the configs file damoyolo_tinynasL35_M_tea.py from damoyolo_tinynasL35_M.py. Or is there any damoyolo_tinynasL35_M_tea.py file in other folder? Cause i don't see the damoyolo_tinynasL35_M_tea.py file in ./config forder.
Sorry for the misleading, the damoyolo_tinynasL35_M_tea.py is a copy of damoyolo_tinynasL35_M.py. The ''tea'' suffix is used to distinguish the work folder, as we save the checkpoints into "./workdirs/config_file_name/" by default.
Before Asking
[X] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[X] I want to train my custom dataset, and I have read the tutorials for finetune on your data carefully and organize my dataset correctly; 我想训练自定义数据集,我已经仔细阅读了训练自定义数据的教程,以及按照正确的目录结构存放数据集。
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking
Question
You said that you use S as teacher to distill T, and M as teacher to distill S, while M is distilled by it self. So is there any way for me to perform this technique on my data
Additional
No response