Music Generative Pretrained Transformer

We aim to scale up music transformer models to the largest symbolic music dataset available.

Prerequisites

We recommend using Conda. You can create the environment with the following command.

conda env create -f environment.yml

For copyright concern, please download the MuseScore dataset yourself. You may find this repository helpful.

Get a list of filenames for each dataset.

find data/muse/muse -type f -name *.mscz | cut -c 16- > data/muse/original-names.txt

Convert the MSCZ files into MusPy files for processing.

python convert_muse.py

Note: You may enable multiprocessing via the -j {JOBS} option. For example, python convert_muse.py -j 10 will run the script with 10 jobs.

Extract a list of notes from the MusPy JSON files.

python extract.py -d muse

Split the processed data into training, validation and test sets.

python split.py -d muse

Train a Music GPT model.

Absolute positional embedding (APE):

python musicgpt/train.py -d muse -o exp/muse/ape -g 0
Relative positional embedding (RPE): muse python musicgpt/train.py -d muse -o exp/muse/rpe --no-abs_pos_emb --rel_pos_emb -g 0
No positional embedding (NPE):

python musicgpt/train.py -d muse -o exp/muse/npe --no-abs_pos_emb --no-rel_pos_emb -g 0

Please run python musicgpt/train.py -h to see additional options.

Evaluate the trained model.

python musicgpt/evaluate.py -d muse -o exp/muse/ape -ns 100 -g 0

Please run python musicgpt/evaluate.py -h to see additional options.

Generate new samples using a trained model.

python musicgpt/generate.py -d muse -o exp/muse/ape -g 0

Please run python musicgpt/generate.py -h to see additional options.