spectraldoy / music-transformer

My project to build and train a Music Transformer in PyTorch.
GNU General Public License v3.0
84 stars 12 forks source link

Training Details #5

Closed pranavjad closed 2 months ago

pranavjad commented 3 months ago

First of all, this is a great repo and it was super helpful in understanding the music transformer! I had some questions about training. For the pretrained models you included, how many epochs did you train for, and approximately how long did training take? Do you have any graphs from these runs that you could share, and what training / validation loss did you achieve?

spectraldoy commented 3 months ago

Hey, sorry for the late response! I'm glad you found this repo helpful.

I actually am not entirely sure about the Chopin Transformer, or model6v2. However, model4v2 was trained for 1,000,000 train steps (i.e. 1,000,000 backpropagation operations through the entire model), using a random subset of the MAESTRO dataset. The vgmtransformerv4 was trained for around 150,000 training steps (which is around 500 epochs of training with around 10000 preprocessed samples with a batch size of 32) on a subset of the ADL Piano Midi dataset. Nevertheless, the VGM Transformer sounds a lot better than model4v2 in my opinion. The main reason for this, as far as I can tell, is that the VGM data I trained on is much less diverse than western classical piano music from the last 4 centuries.

I thus want to emphasize that much more important than the training time or number of epochs is the dataset that you choose. This model generates much better music, no matter how long I train it for, when the dataset is not very large, and the style of music is not very diverse. This is not a very limiting factor - for instance there is a rich compendium of Baroque music that is highly diverse in content - but not very diverse in style.

Now to actually answer your questions - I don't remember exactly how long it took for the other models, as I kind of just let the training process run (and for the first few models, I spent most of my time debugging my code). However, for the VGM transformer, I've got the stats. I had limited hardware - just a T4. It took about a month of training for 1.5 hours every day for it to be good enough for me. I could still train it longer as it never overfit. I'm sure it'll be faster on better hardware.

Here are the training and validation losses with time for the VGM transformer (note I cut out the first 8 epochs of loss data as it jumps quickly down from 5.0, which makes the graph hard to read):

image

Train is blue, validation is orange.

Please let me know if you have any additional or follow-up questions, or if I missed something that you wanted to know about!

spectraldoy commented 2 months ago

Marking as closed due to inactivity. Hope this much info helps!