gwinndr / MusicTransformer-Pytorch

MusicTransformer written for MaestroV2 using the Pytorch framework for music generation
MIT License
230 stars 49 forks source link

Working Google Colab version #11

Open asigalov61 opened 3 years ago

asigalov61 commented 3 years ago

Hey Damon,

I think I finally did it and I was able to make a fully working Google Colab that actually plays well. I used my TMIDI processors and I also streamlined the colab/implementation.

https://github.com/asigalov61/SuperPiano/blob/master/%5BTMIDI%5D_Super_Piano_3.ipynb

The only thing it does not have is the control_changes/program_changes/sustains (I still need to implement it in my processors) but it still plays pretty well IMHO on my dataset. Not sure about MAESTRO but you are welcome to try it.

Let me know if it is useful.

Thanks.

Alex.

asigalov61 commented 3 years ago

Here are some samples for you plus all stats.

It plays ok but it plagiarizing like the original Google version. I could easily tell because I trained on my dataset. But even the Google version was plagiarizing pretty heavily so IMHO this is a failed experiment.

What do you think?

Alex PT_Samples.zip

gwinndr commented 3 years ago

Hi Alex,

I'm not sure what you mean by "plagiarizing". For the most part this algorithm is trained in mimicry so what it generates is going to sound very similar to what it's trained against.

The easiest way to get around this kind of thing is adding more data. In this case, midi augmentations will likely help.

Thanks! Damon

asigalov61 commented 3 years ago

Hey Damon,

When I say Google PT si plagiarizing, I mean exactly that - it blatantly plays the composition from the dataset w/o many changes/modifications. It is particularly noticeable if you train on the music you are familiar with (which is always a good idea with Music AI). So as far as I can tell, this particular Music AI system/implementation will always do that due to the fact that it is indeed can mostly mimic learned music and can't produce its own original one. In fact, I am not sure if it is even possible to create a capable Music AI atm because my research and experience indicated that it really requires proper AGI (Artificial General Intelligence) which it should be capable of that.

Please note that Google PT does not state what specifically was used to create it so it is totally understandable that many people do not notice these two critical issues. And I personally generated some compositions with Google PT models that were 100% plagiarized and could be easily IDd.

Yes, adding more data may help, but usually, it is not a very effective way to go as it requires a model size increase and it will still replay learned chunks here and there. So I do not think it will be a good option.

Now, from my experience, the most simple and efficient way to make current models/implementations creative/original is to first indeed add the properly tailored augmented versions of the compositions to the dataset prior to training. And another helpful way to further improve it is to feed it VERSIONS of the desired music compositions as opposed to plain augmentation or straight-up originals.

Here is one of such datasets that I use myself and in my projects. Check it out as it may be useful for you. https://github.com/asigalov61/Tegridy-MIDI-Dataset/blob/master/Relax-In-Tegridy-CC-BY-NC-SA.zip

I hope this makes sense.

Alex

gwinndr commented 3 years ago

Interesting ideas. Makes sense though I disagree that adding more data would increase model size. Adding more songs is just adding more training points.

I wish there were more details on the Google model so we could get a better idea of how they trained. All we have to go on are hyperparameters which are usually sufficient but it seems there might be some extra steps we don't know about.

I'll take a look at your dataset when I get the chance

asigalov61 commented 3 years ago

@gwinndr Hey Damon,

Topic related: I am working on an improved version of my Google Colab which I will post shortly.

Thank you for hearing me out. Much appreciated.

I was able to achieve a pretty good result with vanilla GPT2 in terms of generating original and creative (and listenable) music. Take a look here: https://github.com/asigalov61/Optimus-VIRTUOSO

This what I would call acceptable output which is not plagiarized or augmented music. IMHO, please.

AFAIK the MuseNet success was due to careful dataset prep/super-computer training, but most importantly it was due to the MULTIPLE EMBEDDINGs (AFAIK 1 embedding per instrument - MuseNet has 9 IIRC) + some crafty sparse attention implementation. I am not sure what exactly they used so if you have any suggestions here, I would love to hear them.

Now, as far as Google PT. Yes, you are probably correct about model size increase because you can definitely sub model size for compute cost increase. But not all people have access to super-computers and there is always a trade-off between all of these model variables from my experience.

From my experience with the above-mentioned GPT2, if you can, I think you would benefit from longer training, especially with MAESTRO as it is quite large. But I would again suggest watching out for the output plagiarism because the whole point of the exercise here is to make original and interesting/listenable music, not to copy or augment existing stuff. Right?

I am trying to train your implementation again, and I will also post an short update for you soon.

Sorry if it is too much stuff, I am just trying to give you a full feedback here. I hope its ok :)

asigalov61 commented 3 years ago

@gwinndr

https://github.com/asigalov61/SuperPiano/blob/master/%5BTMIDI%5D_Super_Piano_3.ipynb

Ok, so I ran a nice training test with my updated Google Colab and your original implementation, with the following results:

1) Trained for approx 240 steps/epochs and almost 0.9 acc. on this dataset: https://github.com/asigalov61/Tegridy-MIDI-Dataset/blob/master/Tegridy-Piano-CC-BY-NC-SA.zip

2) Results were good and I could finally see some creativity and originality in the output. 3) Still not quite there but at least it is working as expected.

What do you think? Music Transformer Pytorch Output Samples.zip

asigalov61 commented 3 years ago

@gwinndr

And here is MAESTRO 3.0 test samples. Also decent results but I could only get 0.4 acc and ~1 loss which is not too good obviously.

Let me know what you think.

P.S. Samples may be a bit chordy cuz I had to convert timings to 10ms from 3 so do not mind it please... Music-Transformer-Pytorch-MAESTRO3-0-Samples.zip

gwinndr commented 3 years ago

Hi Alex,

Results on your own dataset are sounding really good! On MAESTRO 3.0, it definitely still struggles a bit. Maestro seems to be a very challenging dataset for the model to learn.

I definitely want to dig through your TMIDI processor and get started on reading about the midi augmentations. I was nursing a wrist injury so had to be MIA for a bit. I will likely use TMIDI to test out my training hotfixes. The next on my list is the augmentations since I think those may do a lot of good especially for Maestro which seems to be so challenging.

I am definitely interested in your idea about longer training. Would likely require tweaks to the learn rate function to get the most out of this effort.

Thanks! Damon

asigalov61 commented 3 years ago

@gwinndr

Yes, thank you very much for your reply and feedback. Much appreciated :)

I hope you feel better and your wrist heals well as you will definitely need it to continue working on this project ;)

Yes, my dataset has shown good results because it is indeed much smaller and simpler than MAESTRO so I was not surprised that the results were better. And sure, MAESTRO is indeed quite complex and challenging for AI. I agree. If you will be able to make it work, it will be a cool accomplishment.

I would very happy if you will use TMIDI as I would love to see it being useful, especially in a project like yours. So please let me know how it goes and if you need any help/assistance with it as I would be more than willing to advise you how to use it easily. And any feedback, good or bad, would be greatly appreciated too.

Yes, longer training does help from my experience in some cases so def. try it. But I agree with you that it is probably better to try evaluating on validation as it indeed can help with overfitting and the overall quality of the output. Def. good point and a good place to start/good thing to try.

So please keep me posted on your progress and if you get good results, I would love to see the samples too. And of'course, whatever you would like to share about your experience with Google PT would be greatly appreciated as I am very interested in making it work and replicating Google results.

Btw, I tried it with my multi-instrumental dataset and the results were also decent but obviously multi-instrumental music maybe way too complex and incompatible with this particular transformer implementation.

And as I have said before, I get much better results with GPT2 so I would love to see comparable results with your/Google implementation.

Google PT and MuseNet still remain the best SOTA Music AI systems, so they are most certainly worth replicating and learning from.

Anyways, let me know what you think/your progress.

Thank you very much.

Alex.

asigalov61 commented 3 years ago

P.S. Here is the GPT2 samples from my MuseNet reproduction attempt. Check it out as it should give you a good idea what to strive for and also how GPT2 compares to Google PT. https://github.com/asigalov61/Optimus-VIRTUOSO/tree/main/Samples

Btw, do you know why GPT2 is so awesome? Did they use special tricks in the implementation? I do use GPT2/3 combo for the implementation and it seem to be superior even to Google Reformer which is the latest and the greatest.

I also can generate, say, ~4096 tokens with GPT in about 30 seconds which is much faster than Google PT or Reformer, which makes me wonder why it is so? If you have any idea why I would love to hear it.

Thank you

asigalov61 commented 3 years ago

P.P.S. Here is what Google PT Colab says about the details of training:

_The models used here were trained on over 10,000 hours of piano recordings from YouTube, transcribed using Onsets and Frames and represented using the event vocabulary from Performance RNN.

Unlike the original Music Transformer paper, this notebook uses attention based on absolute instead of relative position; we may add models that use relative attention at some point in the future._

=================

So as you can see they did not use relative attention and also they used a combo of the dataset + even vocabulary from RNN,

asigalov61 commented 3 years ago

Sorry for so many messages....Here are a couple of samples for you to check out:

The big sample is my GPT2 improvisation Piano model playing... This is what I would call original music... Optimus-VIRTUOSO Samples.zip

And the smaller sample is also my GPT2 which was trained on 260 Google Piano Transformer MIDIs (yes, synthetic dataset, aka cannibalization of sorts...) and it works great so I can't complain ;)

I think that GPT2 with GPT3 tweaks is the best and most superior choice for Music AI at the moment. Sorry to tell you this but it was my experience. So if you can reproduce Google results, it will be a great accomplishment IMHO.

asigalov61 commented 3 years ago

Ohh...and lastly...I wanted to show you exactly what I mean by Google PT plagiarizing...

Check it out... Piano-Transformer Plagiarizm-Overfitting.zip

asigalov61 commented 3 years ago

@gwinndr A little update...

I ran a nice training run on 400 MIDIs made with Google PT unconditional model and here are the results.

Not bad IMHO, but the output was a bit monotone and not creative as the original...however, it plays quite well all things considered.

Google-Transformer-Unconditional-Dataset-Output-Samples.zip

xuboot commented 1 year ago

@asigalov61 Hello, I would like to know the current project MusicTransformer-Pytorch and Optimus-VIRTUOSO in music generation which effect is good, or your sample data set where to find it, I want to test or train your project to see the effect, I am a beginner, the current know not much, would like to ask the two big brothers guidance, thanks.

xuboot commented 1 year ago

@gwinndr Hello, I would like to know the current project MusicTransformer-Pytorch and Optimus-VIRTUOSO in music generation which effect is good, or your sample data set where to find it, I want to test or train your project to see the effect, I am a beginner, the current know not much, would like to ask the two big brothers guidance, thanks.

EKGD commented 1 year ago

@asigalov61 can you provided me the re-train model please , seem like the link is dead