huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
3.02k stars 628 forks source link

An implementation question about the distributed training of TinyBERT #106

Closed LorrinWWW closed 3 years ago

LorrinWWW commented 3 years ago

According to the code, only the teacher model is distributedly computed: https://github.com/huawei-noah/Pretrained-Language-Model/blob/a8a705e9c8c952e078b45d1091d3f0ed161483d8/TinyBERT/general_distill.py#L348-L358

However, this contradicts my understanding. In fact, the teacher model does not need to be synchronized because it is static without tuning. The student model does need to be synchronized, otherwise it will be downgraded to a single GPU card training.

Is my understanding wrong? Thanks for the answer!

LorrinWWW commented 3 years ago

duplicated #48

1024er commented 2 years ago

duplicated 你回答了???