ifsm / apollon

Feature extraction frame work for content-based music similarity estimation.
BSD 3-Clause "New" or "Revised" License
4 stars 4 forks source link

Options for TimbreTrack() #23

Open TimZiemer opened 2 months ago

TimZiemer commented 2 months ago

Using TImbreTrack(), I can extract Spectral Centroid, Spectral Spread, Spectral Flux, Roughness, Sharpness and SPL (actually RMS?) from audio files. Is the audio file split into 1-second frames with 500 ms overlap? Or 2^(15) samples? Can I pass any options, like frame size, hop length, windowing function for the Fourier analysis, etc.? The documentation does not provide any details.

Teagum commented 2 months ago

Is the audio file split into 1-second frames with 500 ms overlap? Or 2^(15)

TimbreTrack uses the following default parameters defined in its constructor:

https://github.com/ifsm/comsar/blob/aeb45d03409e223ff417d8d9345e7b128fc3a3af/src/comsar/tracks/_timbre.py#L21C1-L23C59

Is the audio file split into 1-second frames with 500 ms overlap? Or 2^(15) samples?

Both apollon, and comsar expect window size and overlap parameters to be given in SAMPLES. So, n_overlap=1024 defines windows with 1024 samples overlap.

Can I pass any options ...

Yes, you can. Use the stft_params parameter in the constructor of TimbreTrack:

from apollon.signal.container import StftParams
# from apollon.signal.models import StftParams    # depending on your apollon version

params = StftParams(fps=44100, window="hann", n_perseg=2**13, n_overlap=2**12, extend=True, pad=True)

track = TimbreTrack(params)

The window parameter accepts SciPy's standard window names, but currently no homegrown functions or additional parameters. extend=True extends the input array on both sides with half a window length of zeros. This enables centering point estimates and mitigates fade in/out artifacts. pad=True additionally zero pads the input to match the window specs exactly.