some feedback / beat detection

funjobie commented 6 years ago

Hi,

i recently tried this generator, and i was quite impressed about the project. Thanks a lot! This really addressees the whole toolchain from mp3 to song, which is amazing. However there are some aspects that were not yet working great, so i took a closer look. I am considering creating a fork of the project so i can work on it some more and give that back as well.

the problem with the hanging ui became an issue because if the ui doesn't respond within 60 seconds, it is will be forced to a break. so i start the generator in a thread now.

the main issue is that the song generation 'invents' notes, rather than matching a detected beat. due to that the songs felt very off. This is happening in two places:

The BaseRhythmGenerator doesn't take the actual beats, but rather just works with the estimated bpm to create notes at fitting places. This is not really great because the bpm is just an estimate and typically a bit (or a lot) off. for example if the real bpm would be 120 and the estimate 119, then all notes get an increasing distance from their supposed position. at some point the error will become so huge that it rolls over, so occasionally (but rarely) a note matches. Another problem with this is that it assumes the song to have a constant bpm. This is definitely not true; many artists play with this the tempo to create slow sections or ramping up/down sections. even if they would restrict themselves (maybe a section with exactly 50% bpm, so one could say the notes still match the beat) it creates notes that shouldn't be there. So my suggestion here would be that the rythm generator only works with real beats and applies its patterns on them.
The BeatDetector also creates notes ('regular beats') rather than detecting them reliably. When i visualized the generated 'real' notes, i found that it overall did not find many beats. I am not really sure why it doesn't as the intent of it all wasn't too clear to me. I at least found it strange that during debugging a saw a median filter with a window size of 44100 being applied on an array with ~15k elements, but not sure if that is its only problem. I created an alternative detector with the intention of identifying all / the most relevant notes, based on some ideas seen in the code but applied differently. The idea is to first identify the most influential frequency for each time position, which is the one with the greatest variation in strength over a window (of 1 second). This helps isolating the beat to focus on one frequency rather than all of them, which gives a more clear beat. Then beat candidates are found, which are a local maximum in strength, above a certain threshold. (only considering the frequency that is currently most important). After that, beats are merged that are too close to each other, with the stronger beat consuming the weaker neighbors (important, because averaging would create artificial off beat data!); within a window of some milliseconds ( not sure about its range). This approach can handle corner cases like varying bpm quite nicely, so that one can feel a difference between song sections. It produces a lot of notes during intense sections, but i think it is better if the rythm generator decides which ones not to utilize.

i attached a diff file with the tweaks i made; it is by no means well written and just to get the idea across but if you want to take a look or try it, feel free to do so. unified_diff_song_generator.diff.txt with that change the generator uses the new beat detection to create a song that only contains aligned notes. it can handle varying bpm's during a song as well, so that fast sections typically result in more notes than slow sections. the note generation is currently reduced to just random notes; mainly to verify the detector. (also just expert to debug it faster; not that it matters while the difficulty filter is commented out)

mindleaving commented 6 years ago

Hi funjobie. Thank you so much for your comments, thoughts and your diff. I'm looking forward to trying it out when I get back from work in the evening. I'll get back to you.

mindleaving commented 6 years ago

Hi again.

I now had the time to try it out. I'm impressed. The beats reflect the variation of the song well. It's a little too sensitve at times where it catches onto either noise or sub-sub-beats, but I tried 3 different songs and they all had few notes at quiet sections and went crazy at guitar or drum solos :D. Looking at the greater picture, I have been a little frustrated by the choice of Beat Saber to let their "_time" entries to be expressed in beats instead of seconds. I totally agree that BPM isn't something set in stone for a song. Especially if it isn't electronic music where the beat is generated. I decided to use the pure BPM for note generation because it worked "good enough" with the songs I tested, which all are electronic and hence do keep the beat. Regarding the strange window size for the median filter, I got thrown of by that as well when I looked into that earlier today. I've added additional comments to explain the large window size compared to the input length. The reason being that I pass points with X=sample index to the median filter and the window size relates to the X-values and not the number of samples in the window. In BeatDetector: Be aware that intensityAfter has a wrong index (minus instead of plus 1). After I fixed that the algorithm doesn't work as well. Using intensityBefore is probably enough.

I'm excited to see what the result looks like when you combine your beat detector with a more predictable note generation. I definitely see the potential. Great work!

mindleaving commented 6 years ago

Check out branch issue-1, where I applied your diff and performed some refactoring and reversion of code you had changed but didn't use (e.g. adding BeatIndex to Note). Also removed one unnecessary for-loop in BeatDetector. Don't know if it's of any help, but was just the result of the code review.

funjobie commented 6 years ago

Thanks for taking a look and updating the change set. I didn't notice the wrong index for intensityBefore; maybe it really isn't needed. i will probably play around with that a bit. Yes, occasionally it picks the wrong beat to focus on. I could imagine with some more processing it could at least be avoided to jump often, but i find it hard to express the preference in actual code. For now i will probably focus on note generation. I have some rough ideas but nothing concrete. For example i noticed the 'most important beat' data is usually continuous for some time, so i am considering using that to divide the song into sections with different properties. If possible i want to use different styles, rather than patterns, throughout the song, to avoid seeing the same sequences across different songs. But still a lot to do for this.

funjobie commented 6 years ago

Hi, just wanted to let you know that i now finalized my rework and that i created a proper fork at https://github.com/funjobie/beatsabertools and also created a release zip. If i should prepare a pull request for any of it, feel free to let me know, although i am planning to have a bit of a break from it :) (Although i wouldn't mind either if someone else picks up from here on)

to summarize the changes of the fork:

rewritten beat detector, mostly as you already saw but tweaked a bit more to more often favor the real beat, and to align it better (original filter of 1024 samples has a resolution of 23ms, which sometimes was noticeable (or i imagined it at least). i added a post processing step with 128 samples for a 3ms resolution).
rewrote the note generation: I found the original notes to be too predictable and not always physically possible / comfortable. I replaced it with some components:
- a state machine that provides for each directed note all possible next candidates. Based on a hardcoded list where i rated each permutation as e.g. impossible/painfull (never to be generated), weird(very rare) to enjoyable (likely candidate).
- instead of fixed patterns, i added different 'style processors' which filter the candidates and choose which concrete note to take next. the style switches from time to time.
i added TagLib to automatically fill the author field
added batch processing to process multiple songs in a row
removed OggVorbisEncoder and replaced it with invoking lame and oggenc lame.exe --decode "-" | oggenc2.exe -q 5 "-" -o converts a file from mp3 to wav to ogg. It is about 50 times faster than the old approach ( and doesn't run out of memory for huge mp3 files)

once again thanks for the initial creation, without it i would have abandoned the idea long ago!

mindleaving / beatsabertools

some feedback / beat detection #1