stuerp / foo_vis_spectrum_analyzer

A foobar2000 component that displays a spectrum analyzer.
MIT License
13 stars 0 forks source link

Use sliding DFT for CQT/VQT mode + IIR filter bank (analog-style analyzer) #1

Closed TF3RDL closed 5 months ago

TF3RDL commented 9 months ago

I recently just looked at the code related to CQT/VQT functionality and it is in fact a Goertzel algorithm (which is fine only if you want to analyze smaller number of frequencies like in the case of DTMF signals, but it is very CPU-intensive for a full-range spectrum analyzer). So would be nice to use sliding DFT (and its derivatives like sliding windowed infinite Fourier transform, which is basically an IIR filter bank) as a backend for CQT/VQT part since it re-uses the previous output instead of re-calculating the entire thing from scratch (which improves performance) right?

I'm also thinking about an IIR bandpass filter bank-based spectral analysis like Spectralissime, which is similar to sliding windowed infinite Fourier transform except the frequency response of these filters are symmetric in a logarithmic frequency scale (not exactly logarithmic due to cramping at near Nyquist frequency)

BTW, are any drawbacks of using sDFT/SWIFT over current Goertzel algorithm implementation for CQT/VQT mode?

TF3RDL commented 7 months ago

For replacing Goertzel algorithm with sliding DFT (which works better for larger number of bands), the window function has to be implemented in frequency-domain instead of time-domain (just like Brown-Puckette method as in showcqt filter from FFmpeg), but any window functions other than cosine sums can be easily implemented via this paper (which basically uses the output of FFT of windowing functions to generate a sparse kernel for use with sliding DFT)

BTW, I have the CodePen project showcasing analog-style analyzer and sliding DFT visualizations (which is a mockup of obviously-nonexistent foo_cqt_analyzer component) to show what I really meant by these

TF3RDL commented 7 months ago

The thing is that analog-style and sliding DFT analyzers needs this: if (_VisualisationStream->get_chunk_absolute(Chunk, PrevPlaybackTime, PlaybackTime - PrevPlaybackTime)) ProcessAudioChunk(Chunk); PrevPlaybackTime = PlaybackTime; (where PrevPlaybackTime is where the previous playback time since last calculation) instead of: if (_VisualisationStream->get_chunk_absolute(Chunk, PlaybackTime - (WindowSize / (ReactionAlignment/2.0+0.5)), WindowSize)) ProcessAudioChunk(Chunk) (where "reaction alignment" parameter is set to 0.5, therefore it is the same as currently has, but the parameter can be customizable latter, especially the Enhanced Spectrum Analyzer having time offset way ahead of the playback, proportional to the FFT size in samples, not half of them as in original Musical Spectrum) in order to be calculated correctly and other analyzers would prefer latter sample acquisition method

stuerp commented 6 months ago

The thing is that analog-style and sliding DFT analyzers needs this: if (_VisualisationStream->get_chunk_absolute(Chunk, PrevPlaybackTime, PlaybackTime - PrevPlaybackTime)) ProcessAudioChunk(Chunk); PrevPlaybackTime = PlaybackTime; (where PrevPlaybackTime is where the previous playback time since last calculation) instead of: if (_VisualisationStream->get_chunk_absolute(Chunk, PlaybackTime - (WindowSize / (ReactionAlignment/2.0+0.5)), WindowSize)) ProcessAudioChunk(Chunk) (where "reaction alignment" parameter is set to 0.5, therefore it is the same as currently has, but the parameter can be customizable latter, especially the Enhanced Spectrum Analyzer having time offset way ahead of the playback, proportional to the FFT size in samples, not half of them as in original Musical Spectrum) in order to be calculated correctly and other analyzers would prefer latter sample acquisition method

I tried that but it did not produce the right behavior. If I remember correctly there the spectrum lagged behind the sound, especially when compare with Enhanced Spectrum Analyzer.

Look at the result: image

stuerp commented 6 months ago

I recently just looked at the code related to CQT/VQT functionality and it is in fact a Goertzel algorithm (which is fine only if you want to analyze smaller number of frequencies like in the case of DTMF signals, but it is very CPU-intensive for a full-range spectrum analyzer). So would be nice to use sliding DFT (and its derivatives like sliding windowed infinite Fourier transform, which is basically an IIR filter bank) as a backend for CQT/VQT part since it re-uses the previous output instead of re-calculating the entire thing from scratch (which improves performance) right?

I'm also thinking about an IIR bandpass filter bank-based spectral analysis like Spectralissime, which is similar to sliding windowed infinite Fourier transform except the frequency response of these filters are symmetric in a logarithmic frequency scale (not exactly logarithmic due to cramping at near Nyquist frequency)

BTW, are any drawbacks of using sDFT/SWIFT over current Goertzel algorithm implementation for CQT/VQT mode?

You're asking me? You're the math guy ;-)

stuerp commented 6 months ago

For replacing Goertzel algorithm with sliding DFT (which works better for larger number of bands), the window function has to be implemented in frequency-domain instead of time-domain (just like Brown-Puckette method as in showcqt filter from FFmpeg), but any window functions other than cosine sums can be easily implemented via this paper (which basically uses the output of FFT of windowing functions to generate a sparse kernel for use with sliding DFT)

BTW, I have the CodePen project showcasing analog-style analyzer and sliding DFT visualizations (which is a mockup of obviously-nonexistent foo_cqt_analyzer component) to show what I really meant by these

But why are there 2 other frequency spikes at 650 and 700 Hz when I feed it a 440 + 880Hz sample? Also, enabling the "NC method" switch only produces this 'clean' result. Everything else produces too much noise:

image

stuerp commented 6 months ago

First implementation attempt of SWIFT: image Does not look right to me. What do the values of the output of the sDFT represent? I know they're scaled between 0 and 1 but what is the unit?

TF3RDL commented 6 months ago

For replacing Goertzel algorithm with sliding DFT (which works better for larger number of bands), the window function has to be implemented in frequency-domain instead of time-domain (just like Brown-Puckette method as in showcqt filter from FFmpeg), but any window functions other than cosine sums can be easily implemented via this paper (which basically uses the output of FFT of windowing functions to generate a sparse kernel for use with sliding DFT) BTW, I have the CodePen project showcasing analog-style analyzer and sliding DFT visualizations (which is a mockup of obviously-nonexistent foo_cqt_analyzer component) to show what I really meant by these

But why are there 2 other frequency spikes at 650 and 700 Hz when I feed it a 440 + 880Hz sample? Also, enabling the "NC method" switch only produces this 'clean' result. Everything else produces too much noise:

image

These two extraneous spikes (which nobody can hear that anyway) are artifacts of the NC method, and the artifacts the image above shows are exactly the same for me (so there is nothing we can do about it): nc method artifacts

First implementation attempt of SWIFT: image Does not look right to me. What do the values of the output of the sDFT represent? I know they're scaled between 0 and 1 but what is the unit?

The actual output of the SWIFT (after converting into magnitude from complex-valued output) is the same unitless values (0 corresponds to -Infinity dBFS, 0.5 is approximately -6dBFS, and 1 is 0dBFS) as other spectrum analyzer types

BTW, that image above the second quote doesn't look right and what does really a SWIFT would actually look like: spectrum of 440 and 880 tones As I said before, a proper way to calculate SWIFT/sDFT/analog-style/IIR filter bank analyzer would involve making sure that the audio chunk acquisition only get new samples since last calculation

Also, the asymmetric response of the SWIFT (actually symmetric in linear frequency scale as opposed to logarithmic for biquad filter bank) means the bass boosted songs literally masks the higher frequencies (especially when the filter order is below 2)

stuerp commented 6 months ago

As I said before, a proper way to calculate SWIFT/sDFT/analog-style/IIR filter bank analyzer would involve making sure that the audio chunk acquisition only get new samples since last calculation

I missed that remark and it did the trick:

image

Synthetic 440Hz sine wave with a little noise.

I'll expose it in the next version. First I have to get v0.7.0 released.

TF3RDL commented 6 months ago
  • Any reason why no window function is applied to the time domain?

Definitely, it is not possible to apply any time-domain windowing function to any of sDFT algorithms as it breaks its sliding property, but it is possible to implement that in the frequency-domain but it is only relevant on VQ-sDFT (complex-valued recursive FIR filter bank)

The closest thing to a sliding DFT with asymmetric windowing function is sliding windowed infinite Fourier transform (SWIFT), which even with filter order higher than 4 is faster than traditional FIR sliding DFT with windowing functions like Hann applied, but since the SWIFT is an IIR filter bank, it can't be skewed towards older samples like you could with an FFT and a Goertzel-based CQT/VQT

  • Is the bandwidth parameter an integer or is it a real number between 0 and 8?

The bandwidth parameter is a real number with the minimum value of 0 (constant-bandwidth), not an integer value

BTW, both SWIFT and analog-style analyzer can have infinite max window size as it is IIR and the max time resolution can be controlled by the same "FFT size" parameter for consistency with other analyzer algorithms

stuerp commented 6 months ago

@TF3RDL , I assume this issue can be closed?

TF3RDL commented 6 months ago

Probably, but the traditional FIR-based sDFT (the VQ-sDFT algorithm) would benefit from AVX2 instruction set utilization, performance-wise (assuming it is done properly) as it might uses up more CPU than SWIFT and analog-style analyzer combined as I anticipated by experiencing a bad performance with 1/24th octave spectrum with Hann-windowed CQ-sDFT (w/ peak decay calculated during sDFT calculation) on my own WebAudio project that uses AudioWorklets

BTW, the bandpass filter design I've implemented in JS for my CodePen project is simply stacking the exact same second-order/biquad bandpass filter together multiple times in this case of order parameter is more than one, so therefore, it is not compliant with ANSI S1.11-2004 specifications, but for a visualization component like this, it is actually fine (though the Enhanced Spectrum analyzer might use ANSI-compliant IIR filter bank if Crossover adds the analog-style analyzer mode)