slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
MIT License
188 stars 24 forks source link

Musdb model #16

Open hayduck opened 9 months ago

hayduck commented 9 months ago

Hello,

I'm trying to use predict to improve some old music have, as was done here in your project:

Section Ⅴ: Examples for samples upsampled from 11.025kHz to 44.1kHz. The model is trained on the train set of the MusDB-HQ dataset.

but I think I need a msudb experiment yaml file. I was able to download the checkpoint.tf, and tried to use the output naming convention to predict, but there is not a matching experiment yaml file I believe. The dset training hydra config would be nice too if possible.

Thanks much, and cool project.

hayduck commented 9 months ago

If the experiment config is the same as the others, with just different input and output frequencies, im happy to give that a shot and make a pr, I just have no idea if there are other changes.

yihaoch commented 7 months ago

Same here. Looking for the music upscaling model

pf-mpa commented 4 months ago

Hello, I would also be interested in running the model trained on music data. Are there any updates on this?

Ma5onic commented 2 months ago

@hayduck @yihaoch @pf-mpa, the author did answer this question in another issue: https://github.com/slp-rl/aero/issues/5#issuecomment-1513243707

They created a musdb-mixture-11-44.yaml file in the dset folder for musdb containing the following:

# @package dset
name: musdb-mixture-11-44
train: egs/musdb18hq/11025-44100_mixture/tr
valid: egs/musdb18hq/11025-44100_mixture/val
test: egs/musdb18hq/11025-44100_mixture/val

It doesn't look like they used an experiment file directly, they instead specified the options as command line arguments like this:

python train.py \
  dset=musdb-mixture-11-44 \
  experiment=<experiment_name> \
  experiment.nfft=512 \
  experiment.hop_length=64 \
  experiment.lr_sr=11025 \
  experiment.hr_sr=44100 \
  epochs=696 \
  eval_every=175 \
  losses=[stft] \
  experiment.batch_size=16 \
  cross_valid_every=5 \
  wandb.resume=false \
  experiment.aero.spec_upsample=true \
  experiment.upsample=false \
  experiment.aero.enc_freq_attn=0 \
  experiment.aero.norm_starts=2 \
  experiment.aero.dconv_time_attn=2 \
  experiment.aero.dconv_lstm=2 \
  experiment.aero.freq_ends=4 \
  experiment.aero.strides=[4,4,2,2] \
  experiment.aero.channels=48 \
  experiment.melgan_discriminator.ndf=16 \
  +experiment.speech_mode=false \
  cross_valid=false \
  joint_evaluate_and_enhance=true \
  ddp=true \
  visqol=false \

note: I am yet not sure if <experiment_name> is the file name of a yaml expertiment config that is being overwritten, or the name: value for the experiment.

I'm currently training another model, but I'll make a pr of a yaml file containing those experiment options when I get around to trying this again. It would be interesting to upgrade the hdemucs model used by aero to the newest htdemucs which has a far better SDR.

Training Hint: Consider augmenting your MUSDB18 dataset before running the areo resample.py data preparation script. Useful tools:

  • demucs automix.py (requires local demucs install with pip install -e .) creates musically plausible mashups
  • spotify pedalboard can be used for "on-the-fly" augmentations during training: example: augm_data() function
  • audiomentations can also be used for "on-the-fly" audio augmentations (see previous augm_data() example)