Question about the masking scheduler during inference

Hello, I noticed that in your adap_sche function, you normalized the obtained mask ratio function so that the sum of the mask ratios of all steps equals one. I can roughly understand your intention. This means that the total number of tokens retained from all your steps is the final number of tokens (for example, 16x16=256). However, this seems to be different fromOfficial Jax Implementation of MaskGIT (https://github.com/google-research/maskgit). The maximum value of its mask ratio is from 1 to 0. This means that it predicts all tokens at once in the last decoding step and retains all tokens obtained in the last step. I’m not sure if I misunderstood it. Could you please clarify? Thanks a lot!

valeoai / Maskgit-pytorch

Question about the masking scheduler during inference #19