slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
MIT License
199 stars 26 forks source link

Many related small questions #17

Closed shenberg closed 11 months ago

shenberg commented 11 months ago

Hiya, thanks for your great work, it really does work well.

I was trying to train my own model and hit on a few small hitches that I hope to solve for future users of the code:

  1. When going over the instructions to resample the VCTK files, it took me a while to debug that you need to pass the full sample rate (e.g. 16000) and not the sample rate in KHz, as is given in the example command-lines (16). A small update to the readme would be nice for future users (e.g. --target_sr 4 -> --target_sr 4000).
  2. Converting the audio from .flac (VCTK default) to .wav is also a necessary step for the scripts (though note that sox has no issue reading .flac files)
  3. Creating configuration & dataset .yaml files: I copied the provided 4-16 files and modified appropriately.
  4. One of the files, p271/p271_069_mic1.wav has exactly 96001 samples, so for the 12-48KHz task, its length is rounded up to 2 sections for the high-res dataset, but after downsampling it's rounded down to 24000 samples, so it's only one section. This breaks the training code unfortunately. I fixed it by manually trimming the file by one sample (sox p271_069_mic1.wav p271_trimmed.wav trim 0 -1s) and regenerating the .egs files.

I was left with a few questions:

  1. The paper says the setting for 12-48 was nfft=1024 and hop=256, batch size 8. The models you provide in the drive say 512/256 and 512/128. What should I believe?
  2. Does the code support resuming a training run?

Thanks again for the paper and the code - the results are good and the code is as well.

shenberg commented 11 months ago

Ok, answering myself here:

  1. The provided models have their config stored and it matches their names.
  2. there is a parameter called continue_from that resumes training from an existing checkpoint.

I'll open a PR for doc fixes