Open stevenwaterman opened 4 years ago
good idea, one way to implement this might be to use piano roll conversion of four or five tracks and then euclidean distance. https://salu133445.github.io/pypianoroll/visualization.html
I'm not really sure on how the euclidean distance could be used here. I was imagining it more like the levenshtein distance, but the more I think about it the more complex it gets. Originally I thought that I could separate each instrument and compare them one-by-one between tracks, but if the only difference between two tracks is that one uses a flute and the other a clarinet, you'd probably still want that to be flagged up.
The MuseNet encoding is delta-time based, i.e. a token is start note
end note
or wait
. I think that is a good thing for comparing MuseNet encodings directly - as if we compare a track to a slightly offset version of itself, they would appear very similar (just one would have a slightly longer wait right at the start), vs an absolute-time based system where a slight delay would affect the encoding of every note.
Since a section is precisely characterised by notes occuring at time locations within measures, such image-comparison may be most resilient to minor note insertions and deletions or even instrument insertions. The image is illustration from pypianoroll, but possibly your own rendering (instruments interleaved) may be better.
3876 3882 3895 1692 1588 3979 3882 28 3979 3882 1704 3979 3882 1704 3979 3880 3882 40 1713 3979 3882 1585 3979 3882 1704 49 3979 3882 3979 3876 3882 1692 40 3979
Each odd line is the line representing list of simultaneous notes on multiple instruments (even though this example everything is piano note) which we can call note-list line
and every even line represents wait. Now the comparison becomes easy since we can match each notelist line and come up with score (% of notes in common on that notelist across two sections), then match wait line, so to compare two sections we can quickly keep accumulating diff and it is above a threshold we can call them different.
Maybe just levenshtein on raw notelist maybe sufficient. Let me try that in python which has levenshtein built into difflib.
PS: Crazy idea for another feature. YOu could use the multitrack measure split piano roll (example image) as your view on the left hand side panel, showing music at a glance like a composer would see in a DAW, instead of the notes-falling synthesia view that is preferred by learners but really does not show things of interest to composer/arranger.
Here is more illustrative example: 3876 2337 2605 1709 1728 1735 1747 4001 (wait:34) 2733 3969 (wait:2) 1716 2617 4004 (wait:37) 2745 3968 (wait:1) 1723 2621 4001 (wait:34) 2749 3970 (wait:3) 1725 2624 3999 (wait:32) 2752 3974 (wait:7) 2629 1733 3971 (wait:4) 71 3976 (wait:9) 2465 3986 (wait:19)
But approach two will break if a notegroup is inserted. So probably we can add a section-tick for each notegroup. Let me try.
Here is string edit distance on raw encodings using python difflib. I used it on 4 sections I got which do have some significant difference in the melody lead line, so they do show low similarity. If anyone has example of 'overfit' sections, I can run those:
import difflib
section1 = '3968 3644 3968 3894 3982 3968 1472 3976 3976 3772 3968 2340 3767 3639 3978 1473 3968 64 3968 2468 3978 1475 3968 65 3973 1472 3969 3875 2340 3767 3648 64 3968 67 3968 3969 3978 1475 3968 1463 1468 3968 1472 3982 55 3968 1470 1473 3968 60 3776 64 3968 3894 3644 3980 1463 3969 67 1475 3968 62 1472 3968 3894 65 3981 1477 3968 2468 55 3968 3772 67 3968 2335 3639 64 1473 3984 69 3968 1475 3981 67 1475 3968 67 3968 1472 3968 3875 2463 2340 3767 3639 65 3979 1477 3968 3989 69 3968 3767 3651 3968 1475 3981 1480 3989 3779 3968 3651 3968 3894 3968 1866 3978 2468 3970 72 1484 3976 3976 2340 3779 3651 76 3968 1487 3969 1868 3968 67 3968 1994 3971 64 3968 2468 3968 3970 79 3981 1473 3970 3779 3968 3875 2340 3651 3969 1477 3968 1864 1996 4000 1475 3779 3968 3894 1856 3648 65 3968 1472 1992 3968 1463 69 3981 1468 3968 3894 3968 55 3979 2468 3969 1456 3968 2340 3644 3776 1984 1987 1859 3968 64 3980 1465 3982 57 60 67 3970 48 3969 2468 1444 3772 3968 3875 2340 3620 1463 1468 3968 1472 3997 1987 3973 3755 3627 1856 3986 60 3968 64 3973 55 3972 1456 3972 3748 3755 3968 3632 1463 1468 3968 3894 1472 3968 1854 3968 1984 3975 2468 3989 60 1468 3970 3760 55 60 64 3968 2340 1463 3639 1468 3968 1472 3971 1982 1856 3970 48 3973 2468 3981 1477 3968 64 3968 1473 3969 3767 60 3968 3875 2340 1468 3644 69 1477 4002 1472 3968 1475 3968 55 3772 3968 3894 1463 3639 3968 69 3969 65 3980 60 3968 3894 1468 3983 3968 3767 64 1472 67 1475 3968 2468 2347 3636 4002 2475 3968 3875 2340 3764 3636 4004 3627 3764 4004 3755 3968 3632 3968 3894 3981 2468 3981 1847 1984 3971 64 1472 3968 2340 3639 3976 60 3968 55 3969 64 67 3968 2468 3973 1975 3968 1856 3979 1477 3968 3767 3968 3875 2340 3644 3969 1473 3968 3980 1463 1468 3968 1472 3981 55 3968 1470 65 3968 1473 1475 3968 60 3772 64 3968 3894 3627 3639 69 3978 3767 3641 3968 1463 3969 67 1475 3968 1472 3968 62 3968 3894 65 3972 3769 3643 3972 2468 3969 67 3968 55 3968 3755 3771 64 1472 3968 2340 3639 1465 3644 1468 1984 1347 1861 3997 36 3968 2468 3970 3760 67 1475 3968 3875 2340 3620 1456 3767 57 60 3772 64 3968 1463 1468 1472 67 4004 3627 3990 3973 55 3973 60 3968 3748 3755 3968 3632 3968 3894 64 3979 2468 3986 1859 1989 3970 2340 3639 1468 3976 1477 3970 1987 3968 1857 3972 3970 2468 3972 69 3968 1475 3968 1985 3973 3875 2340 3767 3648 67 1475 1859 3974 3996 1472 3968 3760 3968 3894 3639 3983 1852 3968 60'
section2 = '3968 3644 3968 3894 1468 3991 2468 3971 60 3969 3772 3970 2340 64 1472 3651 3998 2468 3968 64 3972 3779 3968 3875 2340 1472 3656 3968 64 3969 3968 67 3972 3973 1475 3968 1468 1472 3983 60 1470 1473 1474 3968 64 3968 67 3784 3968 3894 3651 3982 1472 3968 62 1475 3968 65 3968 3894 66 3979 2468 3971 1473 67 3968 64 3779 1475 1477 3968 2340 3776 3648 3998 2468 3968 67 1475 69 3968 1472 3968 65 3968 3875 2340 3620 3632 3776 3998 3748 3627 3968 1480 3970 3760 3968 3767 3639 3983 3755 3968 3620 3978 2468 3968 67 72 3969 64 3968 1465 1468 3968 3767 3968 2340 3648 3968 3894 3997 2468 3970 60 1470 1473 1475 3968 3776 3968 2335 57 3644 3969 1467 3992 2463 3976 48 3772 65 3968 3875 2335 3627 1473 3969 1451 4001 3755 59 3968 3894 3641 1467 3984 62 3968 1470 3968 3894 59 3979 2463 3969 65 3968 1473 3968 2335 3637 3769 3968 62 3983 1467 3976 65 3970 67 3968 2463 59 3969 43 3968 3765 3968 3875 2340 3632 1468 1475 3968 1472 3970 60 3968 1468 3999 3760 3968 3639 4001 60 3968 1456 3968 1468 3968 3767 3968 3644 3968 3894 3992 2468 3976 64 1472 3968 3772 60 3968 2340 3651 3998 2468 3970 64 3969 1472 3779 67 3968 3875 2340 64 3656 3969 3979 3968 1475 3968 1468 1472 3983 60 1470 1473 3968 64 3968 67 1475 3784 3968 3894 3651 3981 1472 3968 67 3968 62 1475 3968 65 3968 3894 3978 2468 3970 1473 67 3968 64 3779 1475 1477 3968 2340 3648 3998 2468 3968 67 1475 69 3968 1472 3968 65 3968 48 3776 3968 3875 2340 3748 3620 3970 1456 3998 1480 3970 3748 3968 3627 4002 48 67 3968 72 3968 1465 1468 64 3969 3755 3629 3968 3894 3974 3987 2468 3976 3757 60 3968 2335 3627 3968 1467 1470 1475 3968 57 4002 2463 3968 3755 59 3968 3875 2335 3627 3968 1467 3970 1451 4006 3755 3968 3894 3625 3968 59 3969 1467 3986 62 1470 3968 3894 59 3980 2463 3976 3753 3968 2335 3627 3968 62 3968 1465 4008 2463 3968 57 3968 1444 3620 43 3755 1456 1463 1468 1472 67 1475 3968 3632 3970 2340 3969 3639 3974 3644 3973 3648 4004 4021 3995 4086 3997 3748 3760 3767 3772 3776 4067 3871 4022 3871 4022 3871 4016 3973 3871 4022 3876 3882 3889 2340 48 1072 55 1207 60 1212 64 1216 67 1219 3993 60 64 67 3968 3882 55 1079 3995 3877 3882 1084 2620 3968 55 3993 2468 3968 3876 3882 2344 2748 1088 2624 3968 60 3982 2752 3978 3876 3882 2347 1079 2625 3968 2472 64 3985 48 3974 3882 1084 2624 3968 55 3968 2753 3983 2752 3977 3877 3882 60 2620 1088 3989 64 3973 3882'
section3 = '3968 3627 3644 3968 3894 3990 3970 3976 3772 3658 3968 2340 3648 3968 1468 3972 3982 1472 3972 3968 2468 3973 3755 60 3968 3875 2340 3632 1463 3786 3668 3968 1475 3968 64 3982 1468 3968 1472 3982 1470 64 3968 3760 3968 3894 3639 1473 3968 60 3981 1472 3968 67 1475 3968 62 3968 3894 65 3974 3796 3970 2468 3972 1473 67 3968 3767 3968 2340 3639 64 1475 3667 3968 1477 3980 3665 3969 3795 3969 1468 3968 55 3968 3793 3974 2468 3968 67 3968 1472 1475 69 3663 3968 3875 2340 1463 3767 3776 3651 3968 65 4001 67 3968 1480 3968 3779 3968 3648 3997 2468 3968 72 3969 64 3968 1465 60 1468 3968 3776 3968 2344 3644 3968 3894 3968 3791 3972 3991 2472 3968 60 3968 1470 1473 1475 3968 57 3968 1467 3968 2335 3639 3772 3969 3996 2463 3968 1451 3970 3767 3968 3875 2335 3655 3969 43 1451 4002 59 3783 3968 3894 1467 3651 3983 62 3968 1470 3968 3894 59 3978 2463 3969 65 3968 1473 3779 3968 2335 3639 62 3983 1467 3978 65 3969 67 3968 2463 59 3969 43 3968 3767 3968 3875 2340 3620 1468 3968 1472 1475 4003 3968 3627 4002 1456 3968 3755 3968 3632 3968 3894 3982 3976 2468 3978 3760 3658 3968 2340 3639 3986 3786 3968 3656 3977 2468 3972 3784 3968 3767 3968 3875 2340 3644 64 3656 3970 60 67 3982 1468 1472 3973 3784 3978 3748 3772 60 1470 64 1473 3968 3894 3627 3983 1472 3968 62 1475 3968 3894 65 3978 2468 3972 3755 1473 67 3968 2340 3632 64 3968 1477 3998 2468 3968 1475 3968 1472 69 3968 65 3968 48 3760 3968 3875 2340 3620 3969 1456 4002 3748 3968 3627 3983 64 3968 1473 3656 3978 67 3968 2468 3971 1472 3784 3968 3755 3968 2344 3632 3658 3968 3894 3968 65 3986 3786 3971 3656 3971 2472 3968 64 3968 1467 1470 1475 3969 3760 3968 2335 3627 3970 3784 3972 3656 3989 2463 3970 48 3968 3755 3968 3875 2335 3627 3969 1451 3969 3784 3968 3656 3997 3755 3968 3894 3622 3972 3784 3968 3656 3978 3894 3978 2463 3784 3968 3656 3969 3750 3968 2335 3627 3984 3784 3971 3632 3973 3636 3969 2463 3970 43 3639 3968 3755 59 62 67 3968 3875 2340 3970 3644 4000 3760 3632 3764 3767 3772 3971 3636 3974 3639 3973 3644 3983 1460 3985 3760 3632 3764 3767 3772 3973 2468 3968 3636 3973 3639 3971 3760 3968 2340 3984 3764 3767 3639 3973 3644 3974 2468 3968 3648 3969 52 3767 3968 2340 1456 55 1463 1468 3976 3651 3974 3772 3776 4003 3644 3986 48 55 60 1472 3983 3779 3968 3772 3968 3649 3979 2468 3972 64 3777 3968 2340 1475 3651 3985 3779 3968 3653 3977 2468 3973 67 3781 3968'
section4 = '3968 3644 3968 3894 3990 3978 3772 3968 2335 3639 3998 2463 3973 3767 3968 3875 2340 3644 3969 3968 1470 3979 1468 1472 3970 62 3981 1470 1473 3968 60 3772 64 3968 3894 3639 3982 1472 3968 1475 3968 62 3968 3894 65 3979 2468 3971 1473 67 3968 3767 3968 2340 3636 64 3968 1477 3998 2468 3968 1475 3968 1472 69 3968 65 3968 3764 3968 3875 2340 3620 3651 3968 1868 3968 1456 3969 3660 4000 3748 3779 3788 3968 3627 3648 3656 3997 2468 3970 1468 67 3968 64 3968 1463 3968 3755 3776 3784 3968 2344 3644 3968 3651 3968 3894 3995 2472 3973 3772 1470 1473 1475 3779 3968 3639 3651 3968 2335 3643 3968 60 3968 1467 3999 2463 3970 48 3767 65 3968 3875 2335 3627 1473 1996 3969 1451 1866 4002 3755 59 3968 3894 3622 1467 3771 3649 3969 3779 3968 3643 3975 3771 3968 3643 3968 1994 3968 62 3777 1863 3968 1470 3646 3968 3771 3968 3894 59 3979 2463 3968 65 3968 1473 3968 3750 3774 3968 2335 3627 62 3661 3983 1467 3978 65 3970 2463 67 3968 59 3968 43 3968 3755 3968 3875 2340 3620 1468 3968 1473 1475 3994 3974 3748 3968 3627 3976 1991 3990 1456 3972 3755 3968 3894 3632 3992 2468 3976 1868 3789 3968 3760 3968 2340 3639 3660 3978 3658 3969 3788 3974 3656 3972 3786 3970 2468 3973 3767 3968 3875 2340 3644 65 3969 60 3784 3968 67 3971 1996 3974 1468 1472 1864 3983 1470 1473 1992 1866 3968 60 3772 64 3968 3894 3639 3983 1472 1994 1868 3968 1475 3968 62 3968 3894 65 3979 2468 3972 3767 1473 67 3968 2340 3636 64 1477 3998 2468 3968 1475 1996 3968 69 3968 1472 3968 65 3968 48 3764 3660 3968 3875 2340 3620 1859 3968 1468 3968 1456 3998 1480 3969 3748 3788 3968 3627 3656 3997 2468 3968 67 72 3969 64 3968 1465 60 1468 3784 3968 3755 3968 2344 3632 3651 3968 3894 3997 2472 3968 60 1470 1473 1475 3968 3779 1987 3968 3760 57 1861 3968 2335 3627 1467 3968 3651 3999 2463 3968 1869 3968 1989 3968 48 65 3968 3875 2335 3755 3639 55 1463 1473 3970 1451 4001 3767 59 3968 3894 3622 1467 3649 3969 3779 3968 3643 3975 3771 3968 3643 1997 3969 3777 1866 3968 3771 62 3968 1470 3646 3968 3894 59 3979 2463 3968 65 3968 1473 3968 3750 3968 2335 3627 3774 3968 62 3661 3980 1994 3968 1467 3968 1864 3976 65 3970 2463 67 3968 59 3968 43 3968 3755 3968 3875 2340 3620 1468 3968 1472 1475 4004 3627 4001 1456 3968 3755 1992 3968 3632 3968 3894 3991 2468 3975 1868 3968 3789 3968 2340 3760 3639 3968 3660 3976 3658 3969 3788 3974 3656 3972 3786 3971 2468'
sectionlist = [section1, section2, section3, section4] for a in sectionlist: for b in sectionlist: sm = difflib.SequenceMatcher(a=a.split(), b=b.split()) print(round(sm.ratio(),2), end=' ') print()
Result: 1.0 0.03 0.02 0.03 0.04 1.0 0.05 0.05 0.03 0.03 1.0 0.09 0.07 0.04 0.07 1.0
Thanks for doing that! If the raw string edit distance is good enough, I'd rather just do that. Converting it to an image seems like a massive waste, especially since the image is a deterministic output of the encoding. One obvious improvement on the string edit distance would be to map each token to a single character (or in other words, do an array-edit distance).
For example, the raw string edit distance would suggest that 3158
and 1158
are more similar than 3158
and 3160
, when they are the same (realistically the 2nd pair are closer). It would be ideal to have a 'cost' of replacing a token based on how 'different' the replacement is to the old one. Changing volume but keeping the instrument and pitch the same would have a very low cost, whereas changing instrument would be a big cost. I'm not aware of an algorithm that could do that though, and it may not be necessary.
Hello amazing people of Github! :) First I want to say that this is a really wonderful project, it seems like one of the best options so far on Github for people who want to create music with the help of AI.
Now, onto my question :) So, @stevenwaterman and @ravi-annaswamy , where are you at right now regarding this overfitting problem? What is the main obstacle? If all 4 generated branches are clearly different, does that reliably mean that the overfitting didn't happen and that the song is unique, or is there still a danger of some of the branches being problematic copyright wise?
I'd appreciate your answers very much!
Hi! I've basically suspended work on Musetree due to having too much other stuff going on. My comment from June still stands, and I think that edit distance is the best way to do it. You could probably write a custom version of levenschtein distance to achieve what I suggested.
With regards to copyright, I'm afraid that I'll never be able to give a straight answer. Even humans struggle with that - when they hear a melody and compose a new song using part of it, there have been lawsuits over who owns the rights to that new song. Nothing that we do will ever change that, and should assume that you are theoretically breaking the law if you use Musetree for commercial purposes. That being said, I feel comfortable discarding any songs that are clearly copies of copyrighted music, and then using Musetree for my twitch streams and other commercial purposes. In theory I could be breaking the law, but the chance of it being a copyrighted song and me being sued and losing is so slim that I'm not worrying about it. I probably wouldn't sell the music directly though.
Hey! :) Thank you so much for the reply Steven!
I am still starting with the whole music generation with AI field, but I'm very interested in this project. Whether you decide to continue working on it or not, I really appreciate that you released this as an open source project! HUGE RESPECT!
I agree with your point about the life of a musician possibly being tricky regarding copyrights anyway, with or without AI, but I guess without the overfitting issue it makes it much more easy to rest assured.
And I wouldn't mind having to remove a song from my monetized YouTube channel (where I upload music) if someone recognizes the snatch of the song in it and warns me about it. If the warning, and taking the song down is the worst thing that can happen, then I'm totally fine even with the whole overfitting issue :) My fear is having some more serious problems with the law even after I remove the song. I really don't want to get into trouble, especially without any malicious intention...(paranoid in me awakens :D).
If there is a way to train MuseTree on my own set of midi files, that would practically solve it for me, because I know by heart all the midi songs that I use in my current training dataset, and then if something ends up sounding obviously copied from the original, I'd immediately notice it (I can always edit that part in my DAW).
MuseTree uses OpenAI's MuseNet for the actual music generation, so we have very little control over it. It would be possible, in theory, to train a new AI on a known set of songs, but it would need to be an incredibly large set of songs - probably too big for you to know all of them. Additionally, it's going to be really hard to replicate the current output quality. OpenAI does publish research papers so you could probably base it off MuseNet, but that doesn't mean it's going to be easy. As for converting MuseTree to use a new backend, it wouldn't be easy, but it would be trivial in comparison to the difficulty you'd face making the new backend in the first place.
Sadly there's no good answer here - it's an open research question in both CS and Law: how do we avoid our AI breaking copyright, and when does and AI break copyright? We're not really sure.
Hey, thank you very much Steven!
I see. Could you briefly explain to me where does this project reads the pretrained models from then? I cloned the project to my PC, but it's 2.58 MB altogether, and I can't find any pretrained models.
Where did you even find any kind of source for MuseNet, if I may know? I failed to find it any time I looked at the site.
Sorry if I ask dumb questions! :)
Musetree simply sends requests to the backend used by the official musenet tool, which doesn't have a published API but has been reverse engineered by me and others.
From memory, that's handled in broker.ts
I thought there was a paper published around that time but can't find it now - the official tool is part of a blog post with plenty of detail which should start you in the right direction. The author of MuseNet also has this repo which looks like a precursor to MuseNet.
Hope that answers your (completely legit) questions!
Hey Steven, thank you very much! You are very kind.
It's all clear now :) MuseTree definitely gives the best results so far out of all other open source AI music generation projects, at least in terms of maintaining the long term structure of the song, which seems to be a key flaw to most other open source projects.
Keep a great job!
Sometimes, Musenet will start overfitting to a real song (and in one case, generated the entireity of reach out, i'll be there for me).
It's important to prevent this, or at least warn the user about it - since it causes issues with copyright if the user thinks it's an original track.
This can be detected when the 4 children of a node are all almost identical. You could probably use a string similarity algorithm on the child and parent's encodings.
There should be something to warn to user, or even prevent them from using those samples.
This should be toggle-able, and maybe allow you to set the threshold manually?