a bug (?) when distilling TinyBERT on regression tasks with task_distill.py

huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

3.02k stars 628 forks source link

Open yellow-binary-tree opened 3 years ago

yellow-binary-tree commented 3 years ago

in TinyBERT/task_distill.py line 973:

elif output_mode == "regression":
    loss_mse = MSELoss()
    cls_loss = loss_mse(student_logits.view(-1), label_ids.view(-1))

so TinyBERT is actually learning from the label, maybe we should use

cls_loss = loss_mse(student_logits.view(-1), teacher_logits.view(-1))

instead to learn from teacher logits?

itsucks commented 3 years ago

It is a bug. The result of logit distillation is slightly better on sts-b.