Output not good when using predict and provided checkpoints

slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)

MIT License

190 stars 24 forks source link

Output not good when using predict and provided checkpoints #13

Closed Gandhi-Sagar closed 11 months ago

Gandhi-Sagar commented 11 months ago

Hey there!

Awesome project, thank you for sharing.

Not sure if I need to train the models, but i was expecting the provided checkpoints to work out of the box for the normal speech data.

Here is a notebook to predict https://colab.research.google.com/drive/1s8nk1Iadwajd3cFoTqis8nZf_F2Tqgaw?usp=sharing

Input: https://drive.google.com/file/d/1c5sM6CoOQfD8OCy7GRr4qJ9RxchllA3B/view?usp=drive_link

Output: https://drive.google.com/file/d/1FCJCXVwKXN3WXKXw7PdBlk7Up_6Ay0jv/view?usp=drive_link

Am I doing something wrong, or provided checkpoints just won't generalize?

m-mandel commented 11 months ago

Hey! Thank you for using our project.

Listening to the provided samples, it seems that there is some problem with the sample rate. Looking at your code, it seems that you tried to upsample from 16 to 48 kHz, is this correct?

This assumption may be wrong, as the provided checkpoints support upsampling from 12 to 48 kHz (x4 scale factor). If you use the 12 -> 48 checkpoint (couldn't tell from the code), make sure the input is in 12 kHz.

Update me on the matter, in the meantime - I'll leave this issue open. Cheers

Gandhi-Sagar commented 11 months ago

@m-mandel thank you for a super quick response. It's my bad.. yes resampling inputs to 12K and then upsampling to 48K works. Cheers!

Thanks again! BTW, I have cleaned up the notebook a little - its not prod quality or anywhere near that, but could be useful for others if they want to give your project a quick try. Maybe worth adding this to the repo? (I won't be able to maintain it though)