nkandpa2 / music_enhancement

Implementation for "Music Enhancement via Image Translation and Vocoding"
Other
52 stars 5 forks source link

Music Enhancement via Image Translation and Vocoding

Project Overview

The music enhancement project is designed to transform music recorded with consumer-grade microphones in reverberant, noisy environments into music that sounds like it was recorded in a recording studio.

Environment

Data

  1. Choose a local data directory and run mkdir -p <local_data_directory>/corruptions/noise, mkdir -p <local_data_directory>/corruptions/reverb, and mkdir <local_data_directory>/medley-solos-db

  2. Download noise data from the ACE challenge dataset

    • Register to download the data from the ACE challenge website: http://www.ee.ic.ac.uk/naylor/ACEweb/index.html. Note that this dataset contains more than just noise, but for this project we only use the noise samples.
    • Move the ace-ambient and ace-babble noise samples to <local_data_directory>/corruptions/noise
  3. Download the room impulse response data from the DNS challenge dataset

    • Download the data from the DNS challenge repository: https://github.com/microsoft/DNS-Challenge. Note that this dataset contains more than just noise, but for this project we only use the RIRs.
    • Move the small and medium room RIRs to <local_data_directory>/corruptions/reverb
  4. Split the noise and reverb data into train, validation, and test

    • python -m scripts.split_data reverb <local_data_directory>/corruptions/reverb/small-room <local_data_directory>/corruptions/reverb/medium-room <local_data_directory> --rate 16000 --validation_fraction 0.1 --test_fraction 0.1
    • python -m scripts.split_data noise <local_data_directory>/corruptions/noise/ace-ambient <local_data_directory>/corruptions/noise/ace-babble <local_data_directory>/corruptions/noise/demand <local_data_directory> --rate 16000 --noise_sample_length 47555 --validation_fraction 0.1 --test_fraction 0.1
  5. Download Medley-Solos-DB from https://zenodo.org/record/1344103#.Yg__Yi-B1QI. Put the data in <local_data_directory>/medley-solos-db.

The end result of these steps is that there should be two .npz files in <local_data_directory> containing the reverb and noise datasets and a directory <local_data_directory>/medley-solos-db containing the Medley-Solos-DB music dataset.

Training

For the default batch sizes it is recommended to train on a machine with 4 Tesla V100 GPUs

In each of the sample command-lines below, one of the positional command-line arguments is a run directory containing artifacts of the training run. Checkpoints from each epoch are stored in <run_dir>/checkpoints, samples generated after each epch are stored in <run_dir>/samples, and tensorboard data is stored in <run_dir>/tb

Pre-trained Models

As part of this project, we are releasing fully trained music enhancement models at https://www.dropbox.com/s/64bkwdh89wqysgh/model-checkpoints.tgz?dl=0. The following table summarizes the various models being released:

Model Name Model Type Instruments Notes
diffwave_vocoder_all_instruments.pt Diffwave All
diffwave_vocoder.pt Diffwave Piano
mel2mel_all_instruments.pt Mel2Mel All
mel2mel.pt Mel2Mel Piano
sequential_training.pt Mel2Mel + Diffwave Piano Diffwave model trained to convergence followed by adding an uninitialized Mel2Mel model to the input of the Diffwave model. The two models concatenated are then trained with the Diffwave objective
joint_finetuning.pt Mel2Mel + Diffwave Piano Diffwave and Mel2Mel trained independently and finetuned jointly with the Diffwave objective
joint_training.pt Mel2Mel + Diffwave Piano Diffwave and Mel2Mel models trained jointly from scratch with the Diffwave objective

Each of these models can be used to generate enhanced samples. For checkpoints containing both a mel2mel and a vocoder, use the same checkpoint as the mel2mel and vocoder source.

Generating Enhanced Samples

To generate an enhanced version of a particular .wav file use the following command: