scottlawsonbc / audio-reactive-led-strip

:musical_note: :rainbow: Real-time LED strip music visualization using Python and the ESP8266 or Raspberry Pi
MIT License
2.71k stars 641 forks source link

Performance issue : gaussian_filter1d #310

Open setaperlacloche opened 3 years ago

setaperlacloche commented 3 years ago

Context

I studied a lot this part of code (visualization.py:206-222):

        # Transform audio input into the frequency domain
        N = len(y_data)
        N_zeros = 2**int(np.ceil(np.log2(N))) - N
        # Pad with zeros until the next power of two
        y_data *= fft_window
        y_padded = np.pad(y_data, (0, N_zeros), mode='constant')
        YS = np.abs(np.fft.rfft(y_padded)[:N // 2])
        # Construct a Mel filterbank from the FFT data
        mel = np.atleast_2d(YS).T * dsp.mel_y.T
        # Scale data to values more suitable for visualization
        # mel = np.sum(mel, axis=0)
        mel = np.sum(mel, axis=0)
        mel = mel**2.0
        # Gain normalization
        mel_gain.update(np.max(gaussian_filter1d(mel, sigma=1.0)))
        mel /= mel_gain.value
        mel = mel_smoothing.update(mel)

And I tried to profile each line of this code. I discovered that the call to gaussian_filter1d represents 40% of the time of this piece of code, while mel variable is an 1D array with only 24 items (!).

gaussian_filter1d performance is poor

My guess : gaussian_filter1d needs some heavy precomputing to elaborate filter coefficients, but at every call this heavy computing is done again and again. As gaussian_filter1d is a linear function (gaussian_filter1d(a+b) == gaussian_filter1d(a) + gaussian_filter1d(b)), it's possible to extract filter coefficients and fill a cache for future use. So I suggest the following class :

class GaussianFilter1D():
    def __init__(self, size, sigma):
        self._arr = gaussian_filter1d( np.identity( size ), sigma = sigma )

    def filter(self, x):
        if x.ndim == 1:
            return  np.atleast_2d( x ).dot( self._arr )[0]
        else:
            return  x.dot( self._arr )

Bench results:

Before:

>>> mel = np.random.rand(24)
>>> ref = time.time()
>>> for _ in range(100000):
...     _ = gaussian_filter1d( mel, sigma = 1.0 )
...
>>> print( time.time()-ref )
16.78048849105835

After:

>>> mel = np.random.rand(24)
>>> g = GaussianFilter1D( mel.shape[-1], sigma = 1.0 )
>>> ref = time.time()
>>> for _ in range(100000):
...     _ = g.filter( mel )
...
>>> print( time.time()-ref )
0.577545166015625

Conclusion : Speed x29

Note : gaussian_filter1d is used several times in this project : in MEL computing and in visualize_* functions.

joeybab3 commented 3 years ago

Would you be willing to submit a PR for this?

djmax9999 commented 3 years ago

Any news on this ? I am running 502 LEDs in total but limited the # to 256 in case of performance issues on a RP3B. I am looking for any performance boost I can have to get rid of that lags.... Thanks for all the support, You all do an outstanding job :-)