Open tsukasachandesu opened 1 year ago
I think concatenating note number and note duration gets better results. Also, implementing BPE is useful to short sequences. Lucidrains’s repositories have some models. Which one is the best?
I'll try to look into the error when I have time.
For finding good models I mostly experiment and see which ones work best. here are some samples.
https://soundcloud.com/user-419192262-663004693/sets/perceiver-ar-4096 https://soundcloud.com/user-419192262-663004693/sets/compound-word-transformer-pop909 https://soundcloud.com/user-419192262-663004693/sets/routing-transformer-pop909
I tried to test recurrent memory transformer in colab pro+ gpu premium, but I got an error.
RuntimeError: No available kernel. Aborting execution.
Flash attention seems to cause the error.