Closed Jason3900 closed 1 month ago
thanks, I've seen this before, but in practice, it seems to add two norms before and after lfq, and scale the range of the represention. How does these ops alliviate what I encounter?
Leonardo Lin @.***> 于 2024年8月25日周日 21:42写道:
how about https://github.com/zhaoyue-zephyrus/bsq-vit/blob/main/transcoder/models/quantizer/bsq.py
— Reply to this email directly, view it on GitHub https://github.com/lucidrains/vector-quantize-pytorch/issues/154#issuecomment-2308848904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANEUQEIZ7ALAXOAMQT6NTLLZTHNLVAVCNFSM6AAAAABMJVSYOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYHA2DQOJQGQ . You are receiving this because you authored the thread.Message ID: @.***>
Hey, thanks for your excellent work. I encountered a strange problem when using LFQ to train a video tokenizer (decoder is a magvit-v2 like structure). The reconstruction loss has difficulties to converge because LFQ's entropy loss goes up and then keeps in a steady state. In that case, the codebook utilization drops to an extremely low range. How to explain this and fix it? Appreciate for your help.
The code book unique ratio stands for the percentage of how many unique tokens in the quantized sequence. BTW, I followed the magvit-v2 paper to use a entropy decay step and the training is conducted on ucf-101 (13k videos)