The relationship between the KD and L2B

yuyinzhou / L2B

This repository includes the official project of L2B, from our paper "Learning to Bootstrap for Combating Label Noise".

28 stars 3 forks source link

The relationship between the KD and L2B #1

Closed JiaxiangBU closed 2 years ago

JiaxiangBU commented 2 years ago

Thanks for your idea of label reweighting. I am curious about the theoretical foundation. The loss designed contain the label loss and pseudo ones. The latter one seemingly plays a role of teacher model in the knowledge distilling and teach the current batch to train. I think there is a sub-field of KD related, self-teaching.

Moreover, the alpha and beta are both updated during training, it is new in KD where the weight is controlled by a constant or temperature.

yuyinzhou commented 2 years ago

Thanks for your attention!

We agree that there is a relation with KD although this is not our motivation. And we actually also noted in Section 3.2 that our loss is also similar to self-distillation.

For the theoretical analysis, we mainly focus on proving 1) the relation of our objective and the bootstrapping loss and 2) the convergence property.