This is a PyTorch implementation of Google's Onsets and Frames model, using the Maestro dataset for training and the Disklavier portion of the MAPS database for testing.
This project is quite resource-intensive; 32 GB or larger system memory and 8 GB or larger GPU memory is recommended.
The data
subdirectory already contains the MAPS database. To download the Maestro dataset, first make sure that you have ffmpeg
executable and run prepare_maestro.sh
script:
ffmpeg -version
cd data
./prepare_maestro.sh
This will download the full Maestro dataset from Google's server and automatically unzip and encode them as FLAC files in order to save storage. However, you'll still need about 200 GB of space for intermediate storage.
All package requirements are contained in requirements.txt
. To train the model, run:
pip install -r requirements.txt
python train.py
train.py
is written using sacred, and accepts configuration options such as:
python train.py with logdir=runs/model iterations=1000000
Trained models will be saved in the specified logdir
, otherwise at a timestamped directory under runs/
.
To evaluate the trained model using the MAPS database, run the following command to calculate the note and frame metrics:
python evaluate.py runs/model/model-100000.pt
Specifying --save-path
will output the transcribed MIDI file along with the piano roll images:
python evaluate.py runs/model/model-100000.pt --save-path output/
In order to test on the Maestro dataset's test split instead of the MAPS database, run:
python evaluate.py runs/model/model-100000.pt Maestro test
This implementation contains a few of the additional improvements on the model that were reported in the Maestro paper, including:
Meanwhile, this implementation does not include the following features:
Despite these, this implementation is able to achieve a comparable performance to what is reported on the Maestro paper as the performance without data augmentation.