Closed sjtulyf123 closed 1 year ago
Hi, @sjtulyf123. Thanks for the attention.
Actually, we use exponential moving averages (EMA) to update the codebook instead of the vq loss. Refer to vqvae paper appendix for details.
Hi, @sjtulyf123. Thanks for the attention.
Actually, we use exponential moving averages (EMA) to update the codebook instead of the vq loss. Refer to vqvae paper appendix for details.
Got it, thanks.
Hello, in beit v2 paper, the vq-kd loss is
but in the code https://github.com/microsoft/unilm/blob/master/beit2/modeling_vqkd.py#L213, it seems that the second loss term in the above figure is missing? Or is there any consideration on this loss?