MEL values are wrong - Githubissues

setaperlacloche commented 3 years ago

Short version:

Output of FFT transformation is truncated in frequency domain while all the rest of code considers that FFT results cover the entire spectrum (from DC to Nyquist frequency).

Long version:

To explain more in details where the problem is, first take a look at this part of code (visualization.py, microphone_update function, lines 206 to 212)

        # Transform audio input into the frequency domain
        N = len(y_data)
        N_zeros = 2**int(np.ceil(np.log2(N))) - N
        # Pad with zeros until the next power of two
        y_data *= fft_window
        y_padded = np.pad(y_data, (0, N_zeros), mode='constant')
        YS = np.abs(np.fft.rfft(y_padded)[:N // 2])

And assuming that config values are set to default (unchanged).

Time data preparation:

N is equal to N_ROLLING_HISTORY * (MIC_RATE // FPS) ⇒ 1470 N_zeros is equal to 2048-1470 ⇒ 578 So y_data is first windowed and right padded in time domain with 578 zeros.

Next FFT is computed:

np.fft.rfft(y_padded) is an array of 1025 complex values in frequency domain from DC to 22100Hz ( = Nyquist frequency = MIC_RATE/2). Each bin is ~21.53 Hz (MIC_RATE/2048) large. Then this output is truncated to N//2 elements : the problem is here ! After this operation frequency domain of YS is now DC to ~15805Hz (bin size * (N//2-1)).

But the rest of code considers that this set of 735 (=N//2) values covers the full Nyquist bandwidth. For example in mel matrix computation (melbank.py, compute_melmat function, line 136):

        freqs = linspace(0.0, sample_rate / 2.0, num_fft_bands)

sample_rate is set in calling function to MIC_RATE / 2.0 ⇒ 22100Hz num_fft_bands is set int(config.MIC_RATE * config.N_ROLLING_HISTORY / (2.0 * config.FPS)) ⇒ 735 At this step, freqs array mismatches frequencies spectrum of YS. In consequence, expression like mel = np.atleast_2d(YS).T * dsp.mel_y.T computes wrong values.

joeybab3 commented 3 years ago

Thanks for discovering this! As I more focus on the Arduino side of things, the math/python side is a bit "above my pay grade" so to speak.

Is there any reason that the array would be truncated for performance reasons? If not and this is just a bug, let me know what you think the proper way to go about fixing it is, I will see what can be done.

setaperlacloche commented 3 years ago

I think the underlying reflection behind this bug is : As input is zero padded, output is also zero padded so I remove extra data.

Correction suggestions:

(Disclaimer: I'm not able to test the project, so all written code below need to be checked) First correct the bad line (visualization.py:212) : YS = np.abs(np.fft.rfft(y_padded)[:N // 2]) to YS = np.abs(np.fft.rfft(y_padded)) Adapt MEL bank construction (dsp.py:44) samples = int(config.MIC_RATE * config.N_ROLLING_HISTORY / (2.0 * config.FPS)) to samples = (2**int(np.ceil(np.log2(config.MIC_RATE * config.N_ROLLING_HISTORY / config.FPS ))))//2 + 1 I think with these two modifications the code will be able to run without breaking.

Due to this bug MIN_FREQUENCY and MAX_FREQUENCY in config.py were misinterpreted. They need to be corrected by a factor of 735/1025 (the ratio between the two samples formula above). For example 200 Hz becomes 144 Hz.

Global thinking of audio buffer organization:

In this project the audio buffer is a set of N_ROLLING_HISTORY (2) chunks of MIC_RATE/FPS (735) audio samples. For my purposes, I prefer using a circular buffer with a power of two size (1024 or 2048). The algorithm becomes very simple:

Add new audio data in circular buffer. Compute FFT on the entire circular buffer (No padding required).

I think this approach is simpler and perhaps faster than the current one. As I'm a newby in numpy, I wrote my circular buffer in C, but I it will be easy to implement it in numpy.

Out of subject note: In dsp.py, function melfrequencies_mel_filterbank, num_fft_bands parameter is never used.

scottlawsonbc / audio-reactive-led-strip

MEL values are wrong #309