Feature request: mel-frequency cepstral coefficients

It's been a while since this frequency was requested 😅

For others that may land here, I'm happy to add this feature! If somebody creates a pull request to add it, I will be quick to review it, merge it in, and publish a package so it's available on NuGet in good time.

However, I'm unlikely to take the time to research what the request means thoroughly enough to implement it myself.

ChatGPT prompt: I maintain a C# library called 'spectrogram' which generates frequency spectrogram images from signals (often audio, but not always). A user requested that I implement "mel-frequency cepstral coefficients with it's representation in bitmap", but I do not know what that means. Can you explain what this feature is and what it could be used for?

ChatGPT response:

Mel-frequency cepstral coefficients (MFCCs) are features used frequently in audio and speech processing. They capture characteristics of audio in a way that mimics how humans perceive sound. MFCCs are particularly useful for tasks like speech and music recognition, speaker identification, and even environmental sound classification.

Here’s a breakdown:

Mel-Frequency Cepstrum: The “mel” scale is based on human hearing, which perceives frequency logarithmically. MFCCs are calculated by taking a signal, transforming it into a spectrum, applying a mel scale filter bank, and then computing the logarithm of the energies in each band. Finally, a discrete cosine transform (DCT) of these log energies produces the MFCCs, which emphasize the more perceptually relevant parts of the audio signal.
MFCC Representation as Bitmap: Once MFCCs are calculated, they’re often displayed as a “heatmap”-style bitmap, where each row represents a different MFCC (or band), and the time progression is represented along the x-axis. This bitmap visually represents how the sound's spectral characteristics change over time, highlighting features that may be significant for audio analysis.
Use Cases: MFCCs are widely used in applications like:
- Speech recognition: Recognizing spoken words and phrases.
- Music genre classification: Identifying musical styles based on audio features.
- Speaker identification: Recognizing or differentiating between speakers.
- Environmental sound analysis: Recognizing non-speech sounds for use in surveillance or monitoring.

To implement this feature in your 'spectrogram' library, you’d:

Calculate MFCCs from the input signal (libraries like Librosa for Python are popular for this, but there are C# implementations as well).
Map these MFCC values to a color scale to create a bitmap representation, similar to a traditional spectrogram but with MFCCs instead of raw frequencies.

swharden / Spectrogram

Feature request: mel-frequency cepstral coefficients #58