lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch
MIT License
2.53k stars 204 forks source link

LFQ quantizer quickly converged which leads to codebook lower utilization #154

Closed Jason3900 closed 1 month ago

Jason3900 commented 2 months ago

Hey, thanks for your excellent work. I encountered a strange problem when using LFQ to train a video tokenizer (decoder is a magvit-v2 like structure). The reconstruction loss has difficulties to converge because LFQ's entropy loss goes up and then keeps in a steady state. In that case, the codebook utilization drops to an extremely low range. How to explain this and fix it? Appreciate for your help.

image image The code book unique ratio stands for the percentage of how many unique tokens in the quantized sequence. BTW, I followed the magvit-v2 paper to use a entropy decay step and the training is conducted on ucf-101 (13k videos)

leolin65 commented 1 month ago

how about https://github.com/zhaoyue-zephyrus/bsq-vit/blob/main/transcoder/models/quantizer/bsq.py

Jason3900 commented 1 month ago

thanks, I've seen this before, but in practice, it seems to add two norms before and after lfq, and scale the range of the represention. How does these ops alliviate what I encounter?

Leonardo Lin @.***> 于 2024年8月25日周日 21:42写道:

how about https://github.com/zhaoyue-zephyrus/bsq-vit/blob/main/transcoder/models/quantizer/bsq.py

— Reply to this email directly, view it on GitHub https://github.com/lucidrains/vector-quantize-pytorch/issues/154#issuecomment-2308848904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANEUQEIZ7ALAXOAMQT6NTLLZTHNLVAVCNFSM6AAAAABMJVSYOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYHA2DQOJQGQ . You are receiving this because you authored the thread.Message ID: @.***>

Jason3900 commented 1 month ago

158