salu133445 / musicgpt

Music Generative Pretrained Transformer
27 stars 2 forks source link

Thank you for sharing this work! #1

Open asigalov61 opened 2 years ago

asigalov61 commented 2 years ago

Hey Hao-Wen,

I see you finally decided to scale up?!? :) This is very nice and long overdue. :)

First of all, I wanted to share with you my large-scale model/implementation for piano. Check it out as you might find it interesting. It is basically the same thing you are doing but for solo piano.

https://github.com/asigalov61/GIGA-Piano

Secondly, I have a few questions/suggestions:

1) I would recommend you use MMD+LAKH datasets for source data. It's MIDI, its has over 750k MIDIs combined and it is also not copyrighted/no issues with download/use. Using bootleg MuseScore is kinda bad IMHO.

2) Have you tried @lucidrains SinkHorn Transformer implementation and also his Reformer implementation? I tested SinkHorn and it is very nice and useful for large-scale: sparse attention+long seq_len+ relatively fast speeds.

3) How do you sample data/datasets? From my experiments/experience, it is sufficient to sample by seq_len (either randomly or in order). This produces very good results and greatly saves on training time.

4) Do you plan to implement a music filter for bad compositions and for outliers such as repeats and stuff? I think it would help to improve results.

5) Last but not least, are you going to publish samples and trained models? I would be very interested to see your results.

Anyway, this is all for now.

Let me know your thoughts...

And thanks again for sharing your work.

Alex

salu133445 commented 2 years ago

Hi Alex (@asigalov61),

Thanks for the message! Yeah, it's way long overdue 😄

I've checked out the GIGA-Piano project, and it looks super interesting! Also, I love the nice Google Colab demos!

  1. I've tested using LMD in another project Multitrack Music Transformer. The sole goal of this work is to scale the model up to the MuseScore dataset, which is the largest dataset available for now. I am personally interested to see what the model can learn on this large, noisy dataset.
  2. I tried the Reformer implementation, but it's super slow on my machine -- not sure why. I did not experiment with the SinkHorm transformer. I found the linear transformer fast and efficient, and it also offers decent quality in general.
  3. Yes. The max_seq_len is set to 1024 by default. Longer music can be generated by iteratively feeding generated music as input to the model to continue with.
  4. No for now. That would be something orthogonal to the goal of this project in my opinion.
  5. Yes. That's exactly what I'm preparing now. Stay tuned!

Thanks for the feedback!

Best, Herman

asigalov61 commented 2 years ago

@salu133445 Thank you for your response and thank you for your kind words in regard to GIGA-Piano. It means a lot to me :) Also, thank you very much for starring my repos. I really appreciate it too. :)

1) Yes, LMD (LAKH) is a standard of course but I was talking about the new MMD dataset/MIDI scrape. Here is the link which I think you may find interesting: https://github.com/jeffreyjohnens/MetaMIDIDataset

MMD+LAKH == ~750k MIDIs as I have stated in my original post, so it is comparable to MuseScore. I totally understand if you want to use MuseScore anyway. It's okay and I really have no problem with it. I just wanted to direct your attention to the MIDI alternative.

2) Yes, check out SinkHorn by lucidrains, and thank you for sharing your feedback about the linear transformer. I will look into it more. The reason why I recommended Reformer is that it can handle extra-long sequences. I am currently testing lucidrains' Reformer-Pytorch implementation with 8192*4 seq_len and Quad MIDI note encoding (4 integers per MIDI note). This gives 8192 multi-instrumental MIDI notes per seq_len which should be plenty to cover most multi-instrumental compositions in full.

And yes, Reformer is indeed slow (to train) but the advantage is the very long seq_len.

What is the max seq_len you have tested with the linear transformer??? I am just curious... Also what were model's hyper params? How many layers, in particular?

3) RE: Iterative/sliding window generation...It does not always work well, especially for long sequences as you probably know. You can actually see GIGA-Piano which is a good example of this problem. GIGA-Piano uses 1024 seq_len too which is why it is not very good at iterative auto-continuations, unfortunately.

4) Got it! From my experience, the model is usually capable of filtering out noise/garbage by itself. But pre-filtering the music before you train does improve results IMHO and from my experience with GIGA-Piano.

5) Can't wait :) I really want to see your samples and models. :) Especially, if you will be able to achieve good results and good accuracy rate/loss. It would also be really cool if you provide Google Colabs so that it is fast and easy to try. :)

I hope my response makes some sense to you. Please let me know your thoughts.

Thank you.

Alex