slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
MIT License
190 stars 24 forks source link

Model Comparison #8

Closed psp0001060 closed 1 year ago

psp0001060 commented 1 year ago

I am very interested in your paper, thank you for sharing.

May I ask how different models are compared, such as how many epochs are appropriate for training nuwave2. Is the code of the comparison model (such as nuwave2) merged into the AERO code to run the comparison results?

Ma5onic commented 1 year ago

Check out this repo: https://github.com/ZFTurbo/Audio-separation-models-checker

It compares the SDR of the output from audio separation models, like aero and demucs, against the ground truths of the audio.

m-mandel commented 1 year ago

Hi, I've reached out to the authors of nuwave2, and they told me that they trained nuwave2 for "1.2M ~ 1.5M steps." So according to the size of the dataset and the chosen batch-size, I would need to train for that amount. If I recall correctly, since I use the same VCTK dataset as they did, it was sufficient for me to use the pre-trained checkpoints and I didn't need to train from scratch.

For the SeaNET model, I used a similar approach. In the paper, they mention: "In all our experiments, we train for 1 million steps with a batch size of 16 using the same optimizer parameters and ..". So according to the batch size and dataset chosen, I chose the number of epochs to be so that n_epochs*n_steps_per_epoch=1M steps total.

I did not incorporate the code of nuwave2 into my code, but rather used the code from their git.

Hope this helps, The author