michaelkrzyzaniak / Beat-and-Tempo-Tracking

Audio Beat and Tempo Tracking
MIT License
91 stars 12 forks source link

Tempo not found on simple rum loop #3

Open pacomacman opened 3 years ago

pacomacman commented 3 years ago

I'm trying to use your code to extract the tempo from a 16 beat 120bpm audio file which is altogether around 8 seconds long, but the beat_detect_callback is never or rarely called. When I extend the original piece by pasting the same buffer end on end I occasionally get a tempo registered, sometimes very close and sometimes way off target.

How long does this algorithm need to be fed audio before a sensible tempo is detected?

michaelkrzyzaniak commented 3 years ago

There is a very important function, which I see I forgot to document (I will fix it shortly), called

double btt_get_tempo_bpm(BTT* self);

which you can call at any time and get the current tempo estimate, e.g after you have fed all of the audio in to the beat tracker. If you expect the tempo to be constant, calling

btt_set_gaussian_tempo_histogram_decay(btt, 1);

will make it so the algorithm doesn't adapt to changing tempi and will just accumulate all of the tempo estimates over the whole song. (Otherwise you will get the tempo estimate of the last part of the song only).

Tracking the beat locations runs somewhat independently of estimating the tempo, so if you don't need to know the precise moment in time of each individual beat, you can just ignore the callback or turn it off completely to save CPU with

btt_set_tracking_mode(btt, BTT_ONSET_AND_TEMPO_TRACKING);

A few seconds of music with percussion should be more than enough to get a stable estimate. I just made a new demo in demos/extract_tempo that includes a 5-second audio file, it extracts the correct tempo.

pacomacman commented 3 years ago

Thanks for getting back to me Michael.

I've pretty much copied you code and imported your 5 second sample, however it seems to be getting random results.

Here are the results of my test over a number of runs. Notice it starts off pretty convinced it's 200.9 or thereabouts and then just gets prohressively worse and I'm not sure why. Maybe its something in my code. Are you consistantly getting the same results over and over?

2021-09-16 15:57:51.436252+0100 Tempo: 200.9 2021-09-16 15:57:55.207024+0100 Tempo: 200.9 2021-09-16 15:57:56.627837+0100 Tempo: 200.9 2021-09-16 15:57:57.782094+0100 Tempo: 77.9 2021-09-16 15:58:00.073191+0100 Tempo: 200.9 2021-09-16 15:58:01.562354+0100 Tempo: 200.9 2021-09-16 15:58:03.075570+0100 Tempo: 77.9 2021-09-16 15:58:05.835605+0100 Tempo: 200.9 2021-09-16 15:58:07.227389+0100 Tempo: 87.5 2021-09-16 15:58:09.403575+0100 Tempo: 77.9 2021-09-16 15:58:12.719666+0100 Tempo: 200.9 2021-09-16 15:58:14.483047+0100 Tempo: 0.0 2021-09-16 15:58:18.832627+0100 Tempo: 94.1 2021-09-16 15:58:20.308408+0100 Tempo: 0.0

michaelkrzyzaniak commented 3 years ago

The algorithm is deterministic and will give the same result for the same input each time. Feel free to share your code and/or music sample, I could take a look.

pacomacman commented 3 years ago

My code is pretty much what you have, other than I'm running on iOS so malloc is not guarenteed to zero memory first. I'm thinking maybe something like an uninitialised variable might be the cause of the instability right now.

michaelkrzyzaniak commented 3 years ago

I use calloc throughout which is guaranteed to zero memory first on iOS, nowhere is malloc used. I've never tried using this on iOS and I don't have a good way to easily test it. I do use it on Raspberry Pi all of the time without issue. The algorithm only searches for tempi down to 50 BPM (by default) and won't generally return 0 BPM unless it hasn't received any audio, or it the audio was silent. 200.9 is the maximum tempo (by default). I'm not sure what input causes this. Can you plot your input as you feed it into BTT to verify that it is as you expect?

pacomacman commented 3 years ago

Here is the way I am calling the API. It is only using the left channel of a stereo file, but all my tests so far are on MONO files anyway. I can't get a stable accurate result so I'm really not sure what is going on. Even using your example file which is little more than a click track this is giving varying results.

Any ideas or recommendations are welcome.

double calcTempo(char *fname) { long fpos=0; int plen; double bpm = 0;

FILE *fp = fopen(fname, "rb");
if(!fp) return false;

long packetsize = 1024;
long outpos = 0;

bool bStereo = IsStereo(fp);
long sampleRate = getSampleRate(fp);

int frame_size = bStereo ? 4 : 2;

short *packet = (short *)malloc(frame_size * packetsize);
if(!packet)
{
    fclose(fp);
    return false;
}

float *buffer =  (float *)malloc(sizeof(float) * packetsize);

BTT* btt =           btt_new(BTT_SUGGESTED_SPECTRAL_FLUX_STFT_LEN,
                             BTT_SUGGESTED_SPECTRAL_FLUX_STFT_OVERLAP,
                             BTT_SUGGESTED_OSS_FILTER_ORDER,
                             BTT_SUGGESTED_OSS_LENGTH,
                             BTT_SUGGESTED_ONSET_THRESHOLD_N,
                             BTT_SUGGESTED_CBSS_LENGTH,
                             sampleRate,
                             BTT_DEFAULT_ANALYSIS_LATENCY_ONSET_ADJUSTMENT,
                             BTT_DEFAULT_ANALYSIS_LATENCY_BEAT_ADJUSTMENT
                             );

btt_set_gaussian_tempo_histogram_decay(btt, 1);

btt_set_tracking_mode(btt, BTT_ONSET_AND_TEMPO_TRACKING);

float mult = 1.0f / 32768.0f;

fseek(fp, HEADERSIZE, SEEK_SET);

while(1)
{
    plen = (int)fread(packet, frame_size, packetsize, fp);

    outpos = 0;

    if(bStereo)
    {
        for(unsigned int i=0; i<plen; i++)
        {
            buffer[i++] = packet[outpos] * mult;
            outpos+=2;
        }
        btt_process(btt, buffer, plen);
    }
    else
    {
        for(unsigned int i=0; i<plen; i++)
        {
            buffer[i++] = packet[outpos++] * mult;
        }

        btt_process(btt, buffer, plen);
    }

    fpos += plen;

    if(plen < packetsize) break;

    bpm = btt_get_tempo_bpm(btt);
    if(bpm != 0)
    {
          break;
    }   
}

NSLog(@"Tempo: %.1f", bpm);

free(buffer);
free(packet);
fclose(fp);

btt_destroy(btt);

return bpm;

}

michaelkrzyzaniak commented 3 years ago

I compiled and ran your code on my macbook and I get consistent and correct results each time (I substituted constants for HEADERSIZE, bStereo, and sampleRate ). Additionally, I made a single-view iOS project in XCode and put your code in ViewController.m viewDidLoad method. I ran it in the iPhone simulator analyzing my 5-second click/drum track. I still get correct and consistent results each time I run it. It is a mystery. I can't think of any reason you would be getting inconsistent results. I will note that not all .wav files have the format that your code assumes, so I wouldn't rely on that for user-supplied files, but that doesn't explain the issue...

pacomacman commented 3 years ago

Thanks again for getting back to me and sorry for my late reply.

I finally found out what I had done wrong and it now seems to be working fine. I'm curious though how I can use the btt_get_tempo_certainty method to determine the certainty as its not really documented. Does it return values of 0-1 etc. and the closet to 1 the better?

clort81 commented 1 year ago

pacomacman, it would be helpful if you shared what you had done wrong so others could learn from it.

The 0-1 conjecture seems correct. I see the certainty go down when i switch input tracks.