Closed woolnodaniel closed 1 year ago
Hi! the t/t_D
refers to the mask schedule during sampling. t
refers to the current sampling step, and t_T
refers to the total number of sampling steps.
During training, we follow the same training as MaskGIT, where we sample some r
uniformly between [0, 1), then apply a cosine schedule to r
(done in pmask.random
).
apologies if that wasn't clear from the paper, let me know if you have any more questions!
In the VampNet paper it says that mask scheduling is performed according to the current iteration:$
k = \gamma(t/t_T)D
$. However in the implementation (scripts/exp/train.py
, intrain_loop
line 214) we havei.e. instead is calculating the mask "schedule" as $
k = \gamma(r)D
$ based on Sobol noiser
and not the current epoch. Why the discrepancy?