libAudioFlux / audioFlux

A library for audio and music analysis, feature extraction.
https://audioflux.top
MIT License
2.81k stars 119 forks source link

cqcc feature extract #40

Open zhengzhezhe opened 3 months ago

zhengzhezhe commented 3 months ago

hi I want to ask about how cqcc features are extracted, which is not the same as the output of the matlab version.

liufeigit commented 3 months ago

Constant-Q transform: $$X[k]=\frac1{N[k]}\sum_{n=0}^{N[k]-1}x[n]W_k[n]e^{\frac{-j2\pi Qn}{N[k]} }$$

In the field of music, this transform and the chroma features based on it are commonly used. The standard CQT (Constant-Q Transform) implementation requires an impressive length to achieve a certain frequency resolution according $N[k]=Q\frac{f_s}{f_k}$. Although FFT can be used to accelerate this process, it has become ineffective for most business scenarios.

The typical approach, due to the characteristics of Q fitting musical tones, involves solving N[k] with a much smaller size within each octave. If variable bandwidth ratios are not considered, the filter banks produced in the frequency domain are the same for each octave. Additionally, each time an octave is computed, the data must be downsampled by a factor of 2 to be used in the next octave computation. This method is essentially a hack version of an efficient CQT implementation proposed in the 1990s, and most libraries for standard CQT implementations are based on this paper.

Later, the Non-Stationary Gabor Transform was proposed as an optimal solution to address issues related to CQT. It offers significant improvements in efficiency, effectiveness, and invertibility.

Non-Stationary Gabor Transform: $$X(m,k)=\frac1{N[k]} \sum_{n=0}^{L-1} x[n] W_k[n]e^{\frac{j2\pi m(n-\omega_k) }{N[k]} }$$

MATLAB cqt is implemented using the Non-Stationary Gabor Transform approach. AudioFlux provides implementations standard CQT and NSGT. So, MATLAB’s CQT and AudioFlux’s NSGT are more consistent with each other.

Finally, in the field of numerical computation, it is challenging for different frameworks to produce exactly the same values. However, the issue you mentioned regarding CQCC is primarily due to different mechanisms in algorithm implementations. Even for the same algorithm, factors such as optimization techniques and precision in numerical computation make it difficult to achieve identical values.

zhengzhezhe commented 3 months ago

Thank you very much for your reply. If I want to extract cqcc features of audio using audioflux, is this how I use it:

   import audioflux
   cc = audioflux.CQT()
   m_data_arr = cc.cqt(x)
   fea = cc.cqcc(m_data_arr)
   fea1 = audioflux.utils.delta(fea)
   fea2 = audioflux.utils.delta(fea1)
   fea_cqcc = numpy.concatenate((fea1, fea2, fea), axis=0)

Is this standard CQT ? If I want to align the output with the matlab version, should I replace “cc = audioflux.CQT() m_data_arr = cc.cqt(x) ” with “gg = audioflux.NSGT() m_data_arr = gg.nsgt(x)”

Or can you tell me how to correctly use audioflux to extract cqcc features?thank u~