Closed bpwl0121 closed 8 months ago
Thanks for your attention~ We use the Gumbel-Softmax trick to maintain the gradient passing during training.
Thanks for your attention~ We use the Gumbel-Softmax trick to maintain the gradient passing during training.
Gumbel-Softmax trick does its job for sure, but it only relate to the Mask loss, I think. how is the codebook trained? cause you do not use the loss like vq-vae, the common loss for codebook training
Sorry for the late reply~ In our implementation, we first train the token predictor and freeze other modules (without feature quantize), which aims to train a good predictor to select the most informative patches. Then, we freeze the token predictor and conduct the joint VQ training with the objective of Eq.4 to update other modules in the paper.
Sorry for the late reply~ In our implementation, we first train the token predictor and freeze other modules (without feature quantize), which aims to train a good predictor to select the most informative patches. Then, we freeze the token predictor and conduct the joint VQ training with the objective of Eq.4 to update other modules in the paper.
ah, I see thanks for your reply 👍
hi,
thanks for your awesome work, I have a question regarding the loss func in DYNAMIC VISUAL TOKENIZER. the codebook generate the discrete token Vq, how do you train your DYNAMIC VISUAL TOKENIZER by the cosine similarity and the Mask loss since part of it is discrete
thx