slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
MIT License
199 stars 26 forks source link

Predict on CPU #15

Open patriotyk opened 1 year ago

patriotyk commented 1 year ago

As I understand form code there is hardcoded CUDA support. So I have changed device to cpu and replaced model.cuda() with model.cpu() But when I run predict I got strange error:

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 2, 160]

I don't know is it problem with cpu or something else.

m-mandel commented 1 year ago

Hi, Thank you for trying our code! This might be because of the dimensions of your input. If I recall correctly, we assume that the wav file the path directs to is a single-channel (mono) wav file. Could it be that your input is stereo instead of mono?

Also, the audio file should not be too short. I think that at least 1 second long. Let me know if this helps, and anything else to help me reproduce the bug myself.

Best, M

patriotyk commented 1 year ago

Yes, you are right, my input was stereo, thank you. Now it works, but output is much worse than original.

m-mandel commented 1 year ago

Which ckpt were you using? what are the source and target sample rates?

patriotyk commented 1 year ago

I use this checkpoint https://drive.google.com/drive/folders/1JK9VqgfQsWEPOFUkp9Y5OR62G9i3disf

Source is 12kHz and output file generated in 16kHz. As I understand I run incorrect command:

python predict.py dset=4-16 experiment=aero_4-16_512_256 

but it should be

python predict.py dset=12-48 experiment=aero_12-48_512_256

but in this repository is only 4-16experiments, and no any 12-48 experiment files. Did you forget to add it? Or I should create them manually?

m-mandel commented 1 year ago

Yes, you are right - you need to modify the configuration file. If I recall correctly, the only thing you need to change are the sampling rates. From:

lr_sr: 4000 # low resolution sample rate, added to support BWE. Should be included in training cfg
hr_sr: 16000 # high resolution sample rate. Should be included in training cfg

to:

lr_sr: 12000 # low resolution sample rate, added to support BWE. Should be included in training cfg
hr_sr: 48000 # high resolution sample rate. Should be included in training cfg