Open setaperlacloche opened 3 years ago
Thanks for discovering this! As I more focus on the Arduino side of things, the math/python side is a bit "above my pay grade" so to speak.
Is there any reason that the array would be truncated for performance reasons? If not and this is just a bug, let me know what you think the proper way to go about fixing it is, I will see what can be done.
I think the underlying reflection behind this bug is : As input is zero padded, output is also zero padded so I remove extra data.
(Disclaimer: I'm not able to test the project, so all written code below need to be checked)
First correct the bad line (visualization.py:212) :
YS = np.abs(np.fft.rfft(y_padded)[:N // 2])
to
YS = np.abs(np.fft.rfft(y_padded))
Adapt MEL bank construction (dsp.py:44)
samples = int(config.MIC_RATE * config.N_ROLLING_HISTORY / (2.0 * config.FPS))
to
samples = (2**int(np.ceil(np.log2(config.MIC_RATE * config.N_ROLLING_HISTORY / config.FPS ))))//2 + 1
I think with these two modifications the code will be able to run without breaking.
Due to this bug MIN_FREQUENCY
and MAX_FREQUENCY
in config.py were misinterpreted. They need to be corrected by a factor of 735/1025 (the ratio between the two samples formula above). For example 200 Hz becomes 144 Hz.
In this project the audio buffer is a set of N_ROLLING_HISTORY (2) chunks of MIC_RATE/FPS (735) audio samples. For my purposes, I prefer using a circular buffer with a power of two size (1024 or 2048). The algorithm becomes very simple:
Add new audio data in circular buffer. Compute FFT on the entire circular buffer (No padding required).
I think this approach is simpler and perhaps faster than the current one. As I'm a newby in numpy, I wrote my circular buffer in C, but I it will be easy to implement it in numpy.
Out of subject note: In dsp.py, function melfrequencies_mel_filterbank, num_fft_bands parameter is never used.
Short version:
Output of FFT transformation is truncated in frequency domain while all the rest of code considers that FFT results cover the entire spectrum (from DC to Nyquist frequency).
Long version:
To explain more in details where the problem is, first take a look at this part of code (visualization.py, microphone_update function, lines 206 to 212)
And assuming that config values are set to default (unchanged).
Time data preparation:
N
is equal toN_ROLLING_HISTORY * (MIC_RATE // FPS)
⇒ 1470N_zeros
is equal to 2048-1470 ⇒ 578 Soy_data
is first windowed and right padded in time domain with 578 zeros.Next FFT is computed:
np.fft.rfft(y_padded)
is an array of 1025 complex values in frequency domain from DC to 22100Hz ( = Nyquist frequency =MIC_RATE/2
). Each bin is ~21.53 Hz (MIC_RATE/2048
) large. Then this output is truncated toN//2
elements : the problem is here ! After this operation frequency domain ofYS
is now DC to ~15805Hz (bin size * (N//2-1)).But the rest of code considers that this set of 735 (=
N//2
) values covers the full Nyquist bandwidth. For example in mel matrix computation (melbank.py, compute_melmat function, line 136):sample_rate
is set in calling function toMIC_RATE / 2.0
⇒ 22100Hznum_fft_bands
is setint(config.MIC_RATE * config.N_ROLLING_HISTORY / (2.0 * config.FPS))
⇒ 735 At this step,freqs
array mismatches frequencies spectrum ofYS
. In consequence, expression likemel = np.atleast_2d(YS).T * dsp.mel_y.T
computes wrong values.