Question about training iterations

Haochen-Wang409 commented 2 years ago

Hi @xiaoachen98 thanks for your convincing work!

After reading your paper and the configs, I am a little bit confused about the training iterations. Did you train 40k iterations for EACH stage? If that is true, the total training iterations wll become to 240k (40k for Mc, 40k for Mf, and 40k for the student, for 2 times)?

In that case, it might not be a fair comparison against other methods.

Looking forward to your reply.

xiaoachen98 commented 2 years ago

Hi @xiaoachen98 thanks for your convincing work!

After reading your paper and the configs, I am a little bit confused about the training iterations. Did you train 40k iterations for EACH stage? If that is true, the total training iterations wll become to 240k (40k for Mc, 40k for Mf, and 40k for the student, for 2 times)?

In that case, it might not be a fair comparison against other methods.

Looking forward to your reply.

Thanks for your attention. The previous comparable works (e.g., ProDA+distill, UndoDA+distill, CPSL+distill) are consisted of four stages (one for warm-up, one for self-training, and two for self-distillation) and about 240K iterations in sequential. Firstly, our DDB could beat them all after one stage. Moreover, the training procedure of Mc and Mf could be integrated into one trainer in parallel, which needs more GPU memory.

Haochen-Wang409 commented 2 years ago

OK I see, since previous works introduced extra knowledge distillation stages. Thanks for your prompt reply!

xiaoachen98 / DDB

Question about training iterations #2