missing loss term about vq-kd beit-v2

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

https://aka.ms/GeneralAI

MIT License

20.2k stars 2.55k forks source link

missing loss term about vq-kd beit-v2 #1290

Closed sjtulyf123 closed 1 year ago

sjtulyf123 commented 1 year ago

Hello, in beit v2 paper, the vq-kd loss is

but in the code https://github.com/microsoft/unilm/blob/master/beit2/modeling_vqkd.py#L213, it seems that the second loss term in the above figure is missing? Or is there any consideration on this loss?

pengzhiliang commented 1 year ago

Hi, @sjtulyf123. Thanks for the attention.

Actually, we use exponential moving averages (EMA) to update the codebook instead of the vq loss. Refer to vqvae paper appendix for details.

sjtulyf123 commented 1 year ago

Hi, @sjtulyf123. Thanks for the attention.

Actually, we use exponential moving averages (EMA) to update the codebook instead of the vq loss. Refer to vqvae paper appendix for details.

Got it, thanks.