From the paper and code implementation, QAT + distillation will use its own configuration with different precisions (fp32 for teacher and int8 for student). What if the teacher use different architecture, i.e YOLOv6L (teacher) -> YOLOv6-S/T (student), is there any benefits from this approach?
@haritsahm I tried to use M/L as a teacher, but didn't get mAP improved, I guess too much gap exists between large-sized teacher and small-sized student.
From the paper and code implementation, QAT + distillation will use its own configuration with different precisions (fp32 for teacher and int8 for student). What if the teacher use different architecture, i.e YOLOv6L (teacher) -> YOLOv6-S/T (student), is there any benefits from this approach?