Missing Convolution Subsampling?

lucasnewman / best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

MIT License

70 stars 7 forks source link

Hi Lucas, I'm looking over the code and I believe you have missed the two convolution subsampling layers in conformer.py,

4.1.1. NON-STREAMING MODELS The model has two convolution layers at the bottom which provide 4 times temporal-dimension reduction for the input sequences. The rest of the layers are a stack of Conformer models. We explore 0.6B model size which is extensively studied in the previous works. The model contains 24 layers of Conformer models.

If you'd like I can create a pull request and implement this for you now. Thanks - If I've misunderstood the paper, please call me out! 😅

lucasnewman / best-rq-pytorch

Missing Convolution Subsampling? #1