Open yiqiwang8177 opened 9 months ago
That is because the codebook is being updated using exponential moving average (EMA), not by the gradient of the codebook loss (see line 176 of vqvae.py).
It's shown in this paper that EMA-based updation is equivalent to updating the codebook using SGD over codebook loss.
Hi, I am currently studying the VideoGPT and have some doubts on VQ-VAE losses.
Where is VQ loss?
Any information to resolve the situation will be highly appreciated