michaelkrzyzaniak / Beat-and-Tempo-Tracking

Audio Beat and Tempo Tracking
MIT License
91 stars 12 forks source link

1x, 2x, 1.5x, 1.33x (+ etc..) tempo candidate harmonization for consistent "1x" answer #11

Open pomplesiegel opened 1 year ago

pomplesiegel commented 1 year ago

Hello!

This algorithm is fantastic and I have really enjoyed using it on my project. I find that it is spot-on when given a solid SNR and when the parameters are tweaked (I have tested it against bpm button tapping and it’s likely more accurate than I am!)

The one question I have on it is if there is a better method I could use for harmonizing the inter-related tempo candidates which it comes up with. For example, if a song has a 80 bpm tempo I often find the algorithm will often switch between an answer of

Because these are all inter-related tempo values and one may seem more likely when a rhythmic event occurs (faster hi-hat pattern…, triplet feel for a few seconds) I wonder if there is a way to weave them together to generate the most likely consistent answer for the value of “1x”.

What are your thoughts on this idea and how it could be implemented? Thank you again for your help!

Thank you, Michael

michaelkrzyzaniak commented 1 year ago

Good question, I'm glad you like the library. This is a known issue that is discussed in the literature. I don't know that there is any satisfactory resolution, because in general, to use your example, you could see {80, 160 and 120} when any of those is the correct tempo, and in general you don't want to assume that it is the lowest one, 1x.

Here are three possible strategies for dealing with it.

1) Add more 'momentum' to the algorithm so whichever candidate is chosen most often will persist even in light of other candidate being chosen every once in a while. You can do that by raising the tempo histogram decay:

btt_set_gaussian_tempo_histogram_decay(btt, 0.9999 / default is 0.999, should be between 0 and 1 /);

The downside is that the algorithm will not adapt very quickly when the tempo actually does change.

2) If you really always want it to choose the lowest tempo estimate, you could weight lower tempi more strongly. By default the algorithm gives stronger weight to intermediate tempi, but you could change that with

btt_set_log_gaussian_tempo_weight_mean(btt, BTT_DEFAULT_MIN_TEMPO);

The downside is that the tempo estimates might often be too slow.

3) If you want to do that more intelligently and you don't mind getting your hands dirty, you could implement exactly the algorithm you describe in your question. Something like this:

— round tempo candidates to nearest integer in bpm — compute GCD of each pair of tempo candidates using Euclid's algorithm — compute a weight for both candidates in the pair, the weight is bpm / gcd — multiply each candidate's score by all of its computed weights

For example, using {80, 160 and 120}

GCD(80, 160) = 80 Weight(80) = 80/80 = 1 Weight(160) = 80/160 = 1/2

GCD(80, 120) = 40 Weight(80) = 40/80 = 1/2 Weight(120) = 40/120 = 1/3

GCD(160, 120) = 40 Weight(160) = 40/160 = 1/4 Weight(120) = 40/120 = 1/3

Each candidate will be compared against each other candidates, so all of the weights will be multiplied together:

Total_Weight(80) = 1 1/2 = 1/2 Total_Weight(160) = 1/2 1/4 = 1/8 Total_Weight(120) = 1/3 * 1/3 = 1/9

Multiply the corresponding scores by these values to get new scores.

As a further example, imagine a fourth (non-harmonic) tempo estimate, 112 bpm

GCD(80 , 112) = 16 GCD(160, 112) = 16 GCD(120, 112) = 8

Weight(80) = 1 1/2 16/80 = 0.100 Weight(160) = 1/2 1/4 16/160 = 0.033 Weight(120) = 1/3 1/3 8/120 = 0.007 Weight(112) = 16/112 16/112 8/112 = 0.001

Then 80 should easily win.

The downsides: It weights lower tempi more strongly (but more intelligently that option 2 above), and it has time complexity of O(n^2) in the number of candidates, perhaps not an issue if there aren't too many candidates.

Consider multiplying in a large constant, maybe multiply each weight by 25, to prevent underflow of 32-bit floats, since the weights are always less than one.

pomplesiegel commented 1 year ago

Thank you so much for the thoughtful answers!

I have been playing around with strategies 1 & 2 you outlined above

  1. I found that this worked quite well when I adjusted other parameters. The algorithm is not responding as quickly to tempo changes / song beginnings, but I prefer this "slow and steady" approach instead of it constantly second guessing itself. I found that .9997 was a nice intermediate value for btt_set_gaussian_tempo_histogram_decay(). In conjunction with other parameter changes I think this solved the problem I described in a satisfactory manner for the moment!
  2. I played around with this mean weighting (very helpful) but in the end I think the middle value you chose (120 bpm) provides a better starting point for all kinds of music. Well done!
  3. This is a great strategy! OK, I will definitely look into implementing this in the future when I get farther along with the project.

Thank you again so much for your help! Best, Michael