Open xxchauncey opened 2 weeks ago
During pretraining, we don't apply specaugmentation before cnn downsampling, but only apply original hubert masking after cnn. We don't change the mask hyperparameters. The main mask hyperparameters are mask_prob and mask_len. mask_prob doesn't need to be change. mask_len of 10 works better than 5, so mask_len is always 10 in our experiments, even if the frameshift becomes 40ms in fasthubert.
hi,
during pretraining, does it use specaugmentation before cnn downsampling and then also apply original hubert masking after cnn? if so, should the mask hyperparameters be decreased since original hubert's framerate is 20ms while fasthubert is 40ms?