Closed Haochen-Wang409 closed 2 years ago
Hi @xiaoachen98 thanks for your convincing work!
After reading your paper and the
configs
, I am a little bit confused about the training iterations. Did you train 40k iterations for EACH stage? If that is true, the total training iterations wll become to 240k (40k for Mc, 40k for Mf, and 40k for the student, for 2 times)?In that case, it might not be a fair comparison against other methods.
Looking forward to your reply.
Thanks for your attention. The previous comparable works (e.g., ProDA+distill, UndoDA+distill, CPSL+distill) are consisted of four stages (one for warm-up, one for self-training, and two for self-distillation) and about 240K iterations in sequential. Firstly, our DDB could beat them all after one stage. Moreover, the training procedure of Mc and Mf could be integrated into one trainer in parallel, which needs more GPU memory.
OK I see, since previous works introduced extra knowledge distillation stages. Thanks for your prompt reply!
Hi @xiaoachen98 thanks for your convincing work!
After reading your paper and the
configs
, I am a little bit confused about the training iterations. Did you train 40k iterations for EACH stage? If that is true, the total training iterations wll become to 240k (40k for Mc, 40k for Mf, and 40k for the student, for 2 times)?In that case, it might not be a fair comparison against other methods.
Looking forward to your reply.