swharden / Spectrogram

.NET library for creating spectrograms (visual representations of frequency spectrum over time)
https://nuget.org/packages/Spectrogram
MIT License
320 stars 58 forks source link

Feature request: mel-frequency cepstral coefficients #58

Closed janjanusek closed 4 weeks ago

janjanusek commented 1 year ago

Hello, I've being playing with your library for a while and first of all I think it's great.

Is it possible for you to implement also mel-frequency cepstral coefficients with it's representation in bitmap?

Thanks

swharden commented 4 weeks ago

It's been a while since this frequency was requested 😅

For others that may land here, I'm happy to add this feature! If somebody creates a pull request to add it, I will be quick to review it, merge it in, and publish a package so it's available on NuGet in good time.

However, I'm unlikely to take the time to research what the request means thoroughly enough to implement it myself.


ChatGPT prompt: I maintain a C# library called 'spectrogram' which generates frequency spectrogram images from signals (often audio, but not always). A user requested that I implement "mel-frequency cepstral coefficients with it's representation in bitmap", but I do not know what that means. Can you explain what this feature is and what it could be used for?

ChatGPT response:

Mel-frequency cepstral coefficients (MFCCs) are features used frequently in audio and speech processing. They capture characteristics of audio in a way that mimics how humans perceive sound. MFCCs are particularly useful for tasks like speech and music recognition, speaker identification, and even environmental sound classification.

Here’s a breakdown:

  1. Mel-Frequency Cepstrum: The “mel” scale is based on human hearing, which perceives frequency logarithmically. MFCCs are calculated by taking a signal, transforming it into a spectrum, applying a mel scale filter bank, and then computing the logarithm of the energies in each band. Finally, a discrete cosine transform (DCT) of these log energies produces the MFCCs, which emphasize the more perceptually relevant parts of the audio signal.

  2. MFCC Representation as Bitmap: Once MFCCs are calculated, they’re often displayed as a “heatmap”-style bitmap, where each row represents a different MFCC (or band), and the time progression is represented along the x-axis. This bitmap visually represents how the sound's spectral characteristics change over time, highlighting features that may be significant for audio analysis.

  3. Use Cases: MFCCs are widely used in applications like:

    • Speech recognition: Recognizing spoken words and phrases.
    • Music genre classification: Identifying musical styles based on audio features.
    • Speaker identification: Recognizing or differentiating between speakers.
    • Environmental sound analysis: Recognizing non-speech sounds for use in surveillance or monitoring.

To implement this feature in your 'spectrogram' library, you’d: