Open eliran-fm opened 2 months ago
I defined the PReLU as follows:
self.prelu = nn.PReLU(h.n_fft//2+1, init=-0.25)
Yes, all the inputs are upsampled to the target SR, which follows the implementation of https://github.com/kuleshov/audio-super-res.
- I defined the PReLU as follows:
self.prelu = nn.PReLU(h.n_fft//2+1, init=-0.25)
- Yes, all the inputs are upsampled to the target SR, which follows the implementation of https://github.com/kuleshov/audio-super-res.
Thanks @yxlu-0102
Following 2., is the usage of audio-super-res
for downsampling supposed to provide a more realistic narrowband version of the inputs?
- I defined the PReLU as follows:
self.prelu = nn.PReLU(h.n_fft//2+1, init=-0.25)
- Yes, all the inputs are upsampled to the target SR, which follows the implementation of https://github.com/kuleshov/audio-super-res.
Thanks @yxlu-0102 Following 2., is the usage of
audio-super-res
for downsampling supposed to provide a more realistic narrowband version of the inputs?
I think the usage of audio-super-res
for downsampling would lead to aliasing of high-frequency components, but for a fair comparison, we used it in this paper.
In our another work of BWE, we switched to using the sinc filter for downsampling and interpolation operations to avoid this aliasing.
Hi, I have two questions regarding the adaptation of mp-senet for the BWE task:
MaskedDecoder
. Should it be initialized as a single-parameter PRelu or rather resemble the way the sigmoid was initialized (with n_fft//2+1 parameters)?Thanks