plunkgj / midi2wave

Wavenet conditioned on midi for music synthesis
36 stars 3 forks source link

midi2wave

This is a pytorch implementation of the "midi2wave" Wavenet component outlined in the paper Enabling Factorized Music Modeling and Generation with the Maestro Dataset. midi2wave is a Wavenet conditioned on midi data, which can synthesize professional-sounding piano audio. The preprocessing tools provided generate wavenet input from the Maestro dataset, a large, high-qaulity dataset consisting of piano audio and midi data.

State of project:

Training issues:

Components of midi2wav unimplemented:

Dependencies

Preprocessing

python resample_audio.py -d /path/to/maestro-v1.0.0
mkdir data
mkdir data/train data/validation data/test
python preprocess_audio.py -d train -c config_makeTrain.json 

Training

Training parameters are stored in config_train.json. The Wavenet configuration in the provided config is a WaveNet autoencoder, with an encoder Wavenet transforming midi into latent code, and an autoregressive decoder Wavenet to generate audio conditioned on that latent code. The audio Wavenet is nearly the same as the default nv-wavenet, but with logistic mixture output. The encoder Wavenet follows all specifications provided in the Maestro paper about their "context stack" Wavenet. The decoder Wavenet is smaller than the Maetsro's audio Wavenet due to GPU memory limitations.

To train the Wavenet run:

python train.py -c config_train.json

Testing

I've provided an inference module for audio generation. Its possible to use the nv-wavenet cuda inference module, but one should take care that the specified Wavenet parameters are compatible with the available nv-wavenet architectures.

My testing procedure has been to generate 4s audio samples, the first 2s using teacher forcing, the second 2s in autoregressive mode. This should help the Wavenet generate audio by providing it with a history of 'good' sampTles to begin autoregressison from. Parameters for running this experiment ae in the provided config_inference.py. To make test data and then run inference:

mkdir test test/4s
python preprocess_audio.py -d test -c config_makeTest.json
python inference.py -c config_inference.py

The inference module can also output teacher-forced audio in trian mode, to quickly assess model audio quality.