We aim to scale up music transformer models to the largest symbolic music dataset available.
We recommend using Conda. You can create the environment with the following command.
conda env create -f environment.yml
For copyright concern, please download the MuseScore dataset yourself. You may find this repository helpful.
Get a list of filenames for each dataset.
find data/muse/muse -type f -name *.mscz | cut -c 16- > data/muse/original-names.txt
Convert the MSCZ files into MusPy files for processing.
python convert_muse.py
Note: You may enable multiprocessing via the
-j {JOBS}
option. For example,python convert_muse.py -j 10
will run the script with 10 jobs.
Extract a list of notes from the MusPy JSON files.
python extract.py -d muse
Split the processed data into training, validation and test sets.
python split.py -d muse
Train a Music GPT model.
Absolute positional embedding (APE):
python musicgpt/train.py -d muse -o exp/muse/ape -g 0
Relative positional embedding (RPE):
muse
python musicgpt/train.py -d muse -o exp/muse/rpe --no-abs_pos_emb --rel_pos_emb -g 0
No positional embedding (NPE):
python musicgpt/train.py -d muse -o exp/muse/npe --no-abs_pos_emb --no-rel_pos_emb -g 0
Please run
python musicgpt/train.py -h
to see additional options.
Evaluate the trained model.
python musicgpt/evaluate.py -d muse -o exp/muse/ape -ns 100 -g 0
Please run
python musicgpt/evaluate.py -h
to see additional options.
Generate new samples using a trained model.
python musicgpt/generate.py -d muse -o exp/muse/ape -g 0
Please run
python musicgpt/generate.py -h
to see additional options.