huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
3.03k stars 627 forks source link

When to use DFL loss #235

Closed WailordHe closed 1 year ago

WailordHe commented 1 year ago

Hi, we noticed that the default config for n and s are not using dfl loss, are the results reported in V3.0 paper without self-distillation achieved by not enabling the DFL loss? In other words, should the DFL loss only be enabled when using self-distillation? Thank you!