valeoai / Maskgit-pytorch

MIT License
145 stars 15 forks source link

Clarification on Additional Token Usage and Embedding in Maskgit-pytorch Transformer #16

Closed RohollahHS closed 1 month ago

RohollahHS commented 2 months ago

Hi. Thanks for the great work. I have two questions.

  1. Can you please clarify what the second 1 is used for? codebook_size is 1024, so its indices are between [0, 1023]. The first 1 in the code is for the mask token, which is 1024. nclass is 1000 for ImageNet. I do not understand the purpose of increasing the nn.Embedding with another 1. Link to code

  2. In the following code, why is self.codebook_size+1 used instead of self.codebook_size? What is the purpose of the additional token when after that we compute the cross-entropy loss? Link to code

llvictorll commented 2 months ago

Hello,

  1. The second "+1" is used for masking the class tokens for the classifier-free guidance. When the condition is dropped, it defaults to the last value of the embedding layer.

  2. This was primarily for an internal experiment I conducted. It allows the model to also predict the token, but can be safely removed if predicting masked tokens is not your objective.

Best,

Victor