Recommended settings / changes for real-time performance on arduino-based microcontrollers

pomplesiegel commented 1 year ago

Hello,

Awesome algorithm and project - super cool!

I am trying to generate a proof-of-concept to have this run on an arduino-based microcontroller. I am currently running some rudimentary performance tests to see if I can get it running in real-time, detecting the currently detected tempo from a buffer of audio samples and printing it to serial monitor (perhaps an external display after that). I want to make sure the microcontroller won't fall behind.

I am not currently capturing audio. Instead I'm just feeding a static array to the algorithm, just for very very basic testing purposes (I realize this is not ideal - but just trying to isolate 1 variable at a time).

Here is what I have found so far regarding the algorithm performance with the code below

Real-time performance remainder (how much shorter the algorithm takes to process data compared to the duration the data represents):

RP2040 (fixed point): ~ -70 ms
ESP32-C3 (fixed point): ~ - 40 ms
Teensy 4.1 (600Mhz floating point): 0-3 ms
Thinking to try SAMD51 as well...

These tests were performed with 64 samples at 44.1kHz, as shown below.

Before I go any farther I wanted to see if there are recommended settings I should change in order to improve the performance when running on embedded microcontrollers.

What would you recommend as first pass changes in parameters and other settings in order to improve real-time performance on these kinds of microcontrollers?

Thank you very much! Michael

Current arduino code for all devices listed above:

#include <Arduino.h>
#include "BTT.h"

const int BUFFER_SIZE = 64; 
const int SAMPLE_RATE_HZ = 44100;
float audioBuffer[BUFFER_SIZE];
BTT * beatTracker = NULL; //our object for everything

const float simulatedAudioCaptureDelayTimeMS = 1000.0 * ((float)BUFFER_SIZE / (float)SAMPLE_RATE_HZ); 

void setup() {
  Serial.begin(115200);
  delay(1000); //so we can catch the serial output

  Serial.println("Simulated audio capture delay time (ms): " + String(simulatedAudioCaptureDelayTimeMS));

  /* instantiate a new object */
  beatTracker = btt_new_default();

  //clear out our buffer to start
  for(int i=0; i < BUFFER_SIZE; i++)
  {
    audioBuffer[i] = 0; //starting with 0s
  }

  //where we're making fake data, just for processing tests
  for(int i=0; i < BUFFER_SIZE / 2; i++)
  {
    audioBuffer[i] = 1.0;
  }
}

void loop() {

  //this is where we would wait for new audio data, and then process it 
  //because we don't have any data, we'll just fake it and wait
  delayMicroseconds( simulatedAudioCaptureDelayTimeMS * 1000 ); 

  auto before = micros(); 
  btt_process(beatTracker, audioBuffer, BUFFER_SIZE);
  float processingTimeMS = (float)(micros() - before) / 1000.0; 
  Serial.println("Processing took (ms): " + String( processingTimeMS )); 
  Serial.println("Real-time remainder (ms): " + String( simulatedAudioCaptureDelayTimeMS - processingTimeMS )); 
  Serial.println("Detected tempo: " + String(btt_get_tempo_bpm(beatTracker)));
}

michaelkrzyzaniak commented 1 year ago

First, if you only want to print the tempo, you can turn off beat tracking which is the highest complexity part of the algorithm. (The tempo tracker just finds the numerical tempo, while the beat tracker finds the exact location in time of the beats):

btt_set_tracking_mode(btt, BTT_ONSET_AND_TEMPO_TRACKING);

The default is: BTT_ONSET_AND_TEMPO_AND_BEAT_TRACKING

—————————— Then you can reduce the number of tempo candidates that the tempo tracker compares when deciding the tempo, which is one of the most intensive parts of the tempo tracker:

btt_set_num_tempo_candidates(BTT* self, int num_candidates);

The default is 10 but really 3 or 4 should be sufficient, if not 1 or 2, if you are willing to admit a few more wrong estimates every once in while. The wrong estimates will usually be double or half the true tempo.

—————————— Then you can reduce the 'spectral flux stft overlap' to either 4 or 2 or 1, (default is 8) which will make the whole algorithm run 1/2 or 1/4 or 1/8th as often respectively, at the expense of lower time resolution.

BTT* btt = btt_new(BTT_SUGGESTED_SPECTRAL_FLUX_STFT_LEN, BTT_SUGGESTED_SPECTRAL_FLUX_STFT_OVERLAP, <— change this one BTT_SUGGESTED_OSS_FILTER_ORDER, BTT_SUGGESTED_OSS_LENGTH, BTT_SUGGESTED_ONSET_THRESHOLD_N, BTT_SUGGESTED_CBSS_LENGTH, BTT_SUGGESTED_SAMPLE_RATE, BTT_DEFAULT_ANALYSIS_LATENCY_ONSET_ADJUSTMENT, BTT_DEFAULT_ANALYSIS_LATENCY_BEAT_ADJUSTMENT );

pomplesiegel commented 1 year ago

Thank you so much! This is incredibly helpful.

Indeed, the largest performance gain I saw was in changing BTT_SUGGESTED_SPECTRAL_FLUX_STFT_OVERLAP to a value lower than 8. This resulted in a significant improvement in the performance on embedded targets. I now have the algorithm seemingly running in real-time (without dropping fake data buffers) on a Teensy 4.1.

I see now that reducing BTT_SUGGESTED_SPECTRAL_FLUX_STFT_OVERLAP down to a lower value than 8 results in the algorithm taking longer to calculate the first tempo (I assume this is the time resolution you referred to). Should I assume this means longer latency in general in order to detect and calculate a new tempo change in the data?

If so, are there any other settings you would recommend changing in order to improve the latency (amount of time it takes to detect and calculate a new tempo) without adding to the CPU time?

Thank you again!

michaelkrzyzaniak commented 1 year ago

Wow, I'm surprised that you got this running on Teensy 4.1. Nice work. You might try inserting clicks into your dummy buffers to make sure the algorithm is still finding the correct tempo. Lowering SPECTRAL_FLUX_STFT_OVERLAP doesn't necessarily increase the latency, it just decreases the temporal resolution so it doesn't give you tempo estimates as often.

Lowering SPECTRAL_FLUX_STFT_OVERLAP does have other side effects because it lowers the internal sample rate. For instance you might want to decrease the size of the internal buffers by the same factor by which you decreased the overlap so that they still hold ~3 seconds of data.

int new_overlap = 2; // 1, 2, 4, or 8 (default) float factor = new_overlap/8.0; BTT btt = btt_new(BTT_SUGGESTED_SPECTRAL_FLUX_STFT_LEN factor, BTT_SUGGESTED_SPECTRAL_FLUX_STFT_OVERLAP factor, BTT_SUGGESTED_OSS_FILTER_ORDER, BTT_SUGGESTED_OSS_LENGTH factor, BTT_SUGGESTED_ONSET_THRESHOLD_N factor, BTT_SUGGESTED_CBSS_LENGTH factor, BTT_SUGGESTED_SAMPLE_RATE, BTT_DEFAULT_ANALYSIS_LATENCY_ONSET_ADJUSTMENT, BTT_DEFAULT_ANALYSIS_LATENCY_BEAT_ADJUSTMENT );

This will also save time by reducing the size of the FFT.

Changing SPECTRAL_FLUX_STFT_OVERLAP also affects the rate at which the tempo estimate decays, making the algorithm more sluggish when the tempo changes (See figure 4 in this paper: https://michaelkrzyzaniak.com/Research/Swarms_Preprint.pdf). This may be the 'latency' you are seeing. So you should lower the decay coefficient accordingly:

float new_coefficient = pow(BTT_DEFAULT_GAUSSIAN_TEMPO_HISTOGRAM_DECAY, 8.0/new_overlap); btt_set_gaussian_tempo_histogram_decay (btt, new_coefficient);

I'm not sure off-hand what other side effects there are, but lowering the internal sample rate will definitely affect the accuracy.

pomplesiegel commented 1 year ago

Incredibly helpful! Thank you so much for this info. I just created a larger array of audio samples (1 second's worth at FS) which I feed into the algorithm to process in pieces, so a much more realistic use case.

In the larger audio array I added clicks at 2x per second, so 120 bpm.

Following the changes you recommended this is what I'm getting for the canned data:

Can't do much better than that!

Excited to get this running with real audio data and continue tuning from there. Thank you!

david-res commented 2 months ago

@pomplesiegel Glad I cam across this issue! Im doing some audio file analysis on a Teensy 4.1 (already doing waveform extraction with peak magnitutde + peak RMS) and I am also looking to extract the average BPM of an audio track that is stored on the onboard SD card

I tried compiling your code example above with the latest version of this repo but I am getting some compilation errors (specifically undefined reference to bbt_new_default() and btt_get_tempo_bpm()

Would you be able to point me to or share your basic code that you were able to get working? Thanks

pomplesiegel commented 2 months ago

Hi @david-res, glad to hear folks are out there doing similar things.

For basics, I just included the repository with my code and added

#include "BTT.h" to the top of my project.

Not sure if it's just a copy/paste error above, but you would need to correct bbt_new_default() to btt_new_default()

If BTT.h is successfully included you should be able to see both of those functions from your code. If you post the full compilation error we may be able to help further. Thanks!

david-res commented 2 months ago

@pomplesiegel It's working great actually I did a small folder structure change to the library and got it to compile without any errors, but I will have another go at using a fresh version of the repo just for the sake of it. Do you mind sharing what you did/are doing with the library? Did you make any use of beat tracking?

michaelkrzyzaniak / Beat-and-Tempo-Tracking

Recommended settings / changes for real-time performance on arduino-based microcontrollers #10