hugofloresgarcia / vampnet

music generation with masked transformers!
https://hugo-does-things.notion.site/VampNet-Music-Generation-via-Masked-Acoustic-Token-Modeling-e37aabd0d5f1493aa42c5711d0764b33?pvs=4
MIT License
295 stars 35 forks source link

Mask Scheduling implementation doesn't match paper #11

Closed woolnodaniel closed 1 year ago

woolnodaniel commented 1 year ago

In the VampNet paper it says that mask scheduling is performed according to the current iteration:$k = \gamma(t/t_T)D$. However in the implementation (scripts/exp/train.py, in train_loop line 214) we have

r = state.rng.draw(n_batch)[:, 0].to(accel.device)
mask = pmask.random(z, r)

i.e. instead is calculating the mask "schedule" as $k = \gamma(r)D$ based on Sobol noise r and not the current epoch. Why the discrepancy?

hugofloresgarcia commented 1 year ago

Hi! the t/t_D refers to the mask schedule during sampling. t refers to the current sampling step, and t_T refers to the total number of sampling steps.

During training, we follow the same training as MaskGIT, where we sample some r uniformly between [0, 1), then apply a cosine schedule to r (done in pmask.random).

apologies if that wasn't clear from the paper, let me know if you have any more questions!