An implementation question about the distributed training of TinyBERT

LorrinWWW commented 3 years ago

According to the code, only the teacher model is distributedly computed: https://github.com/huawei-noah/Pretrained-Language-Model/blob/a8a705e9c8c952e078b45d1091d3f0ed161483d8/TinyBERT/general_distill.py#L348-L358

However, this contradicts my understanding. In fact, the teacher model does not need to be synchronized because it is static without tuning. The student model does need to be synchronized, otherwise it will be downgraded to a single GPU card training.

Is my understanding wrong? Thanks for the answer!

LorrinWWW commented 3 years ago

duplicated #48

1024er commented 2 years ago

duplicated 你回答了？？？

huawei-noah / Pretrained-Language-Model

An implementation question about the distributed training of TinyBERT #106