yanghaha0908 / FastHuBERT

Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
MIT License
80 stars 5 forks source link

mask hyperparameters #5

Open xxchauncey opened 2 weeks ago

xxchauncey commented 2 weeks ago

hi,

during pretraining, does it use specaugmentation before cnn downsampling and then also apply original hubert masking after cnn? if so, should the mask hyperparameters be decreased since original hubert's framerate is 20ms while fasthubert is 40ms?

yanghaha0908 commented 2 days ago

During pretraining, we don't apply specaugmentation before cnn downsampling, but only apply original hubert masking after cnn. We don't change the mask hyperparameters. The main mask hyperparameters are mask_prob and mask_len. mask_prob doesn't need to be change. mask_len of 10 works better than 5, so mask_len is always 10 in our experiments, even if the frameshift becomes 40ms in fasthubert.