tyiannak / pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Apache License 2.0
5.83k stars 1.19k forks source link

What window size to use for beat_extraction() ? #313

Closed ThuongCroud closed 4 years ago

ThuongCroud commented 4 years ago

The beat_extraction function is covered in the MidTermFeatures.py. It requires two arguments: the short_features and the window size in seconds. Maybe it is since I am new to audio analysis, but I've been struggling to figure out what the right window size should be... Usually I encounter this error: ValueError: attempt to get argmax of an empty sequence or in other cases the estimated bpm always ends up as 60 The term "window size" seems ambiguous to me. I thought it could be either of two things: (1) The length of the audio snippet for which I want to estimate the bpm or (2) the short-term window size I have used previously to get the short term features. For reference, this is how I am extracting the short term features. From my understanding the window size in this case would be 1 second.

[Fs, x] = audioBasicIO.read_audio_file(path)
x = audioBasicIO.stereo_to_mono(x)
# for reference: feature_extraction(signal, sampling_rate, window, step, deltas=True)
f_names = ShortTermFeatures.feature_extraction(x, Fs, 1 * Fs, 1 * Fs)
tyiannak commented 4 years ago

bpm is not returned by shorttermfeatures but needs short feature to be calculated. The params you are asking are short-term window and step and should be some tens of mseconds (e.g. 0.050)

realies commented 4 years ago

@tyiannak, is tuning parameters having the potential to increase accuracy? Using python audioAnalysis beatExtraction for a song with 4/4 beats is detected as 171 bpm, loading it in something like Traktor returns 175 bpm, which is the correct value.

tyiannak commented 4 years ago

No, actually the beat_extraction() method implemented in this lib is very simple and is just based on local maxima detection of low-level short-term audio features and is mostly used for educational purposes, it is not optimal in any way.

ThuongCroud commented 4 years ago

Thanks tyiannak. I've been able to get it to work due to your help.