Use DSP magic to decompose a 16-kHz signal into N signals sampled at (16/N) kHz.
This is something RAVE is using (and some previous papers as well) and it helps both the encoder and the decoder.
For encoder, it expands the receptive field of CNNs (and RNNs as well, kind of).
For decoder, it makes generation a lot faster.
Use DSP magic to decompose a 16-kHz signal into N signals sampled at (16/N) kHz.
This is something RAVE is using (and some previous papers as well) and it helps both the encoder and the decoder. For encoder, it expands the receptive field of CNNs (and RNNs as well, kind of). For decoder, it makes generation a lot faster.