Mel-Spectrogram Image Generation Issues

swharden / Spectrogram

.NET library for creating spectrograms (visual representations of frequency spectrum over time)

https://nuget.org/packages/Spectrogram

MIT License

319 stars 57 forks source link

Mel-Spectrogram Image Generation Issues #32

Closed jwoodtke closed 3 years ago

jwoodtke commented 3 years ago

Hi, I'm trying to troubleshoot an error with my mel-spectrogram image generation. The saved .png image looks like the following:

Here is the code I am using:

var sg = new SpectrogramGenerator(sampleRate, FftSize, StepSize, 0, MaxFreq);
sg.Add(audio);

var bitmap = sg.GetBitmapMel(MelBinCount, Intensity, SaveAsDb);
bitmap.Save(file + "MelSpec.png", ImageFormat.Png);

where: FFTSize = 2048 StepSize = 300 MaxFreq = 3000 Intensity = 5 MelBinCount = 200 SaveAsDb = false

Also, I am using the ReadWAV method as it is shown in the readme to read .wav audio inputs.

Any help would be appreciated.

swharden commented 3 years ago

Hi @jwoodtke,

This looks good to me. Is your input audio a single continuous combination of tones that does not change over time? If so, your spectrogram will be identical (Y) at every time point (X) and appear similar to the screenshot you posted.

If not, can you attach a WAV of MP3 of your audio file here so I can take a closer look?

Thanks! Scott

jwoodtke commented 3 years ago

Hey thanks for getting back so quick! @swharden

I am using songs from the GTZAN wav dataset so they are varying tone audio signals (I can't post audio files on here, but I can add you to a google drive file or something though).

This started happening when I started reading audio files and converting them asynchronously, I'm not sure if that's relevant but if it is, here is my corresponding async audio reading code:

private async Task<(double[] audio, int sampleRate, double length)> ReadWav(string file, double multiplier = 10000)
{
   await using (var afr = new AudioFileReader(file))
   {
       int sampleRate = afr.WaveFormat.SampleRate;
       int sampleCount = (int) (afr.Length / afr.WaveFormat.BitsPerSample / 8);
       int channelCount = afr.WaveFormat.Channels;
       var audio = new List<double>(sampleCount);
       var buffer = new byte[sampleRate * channelCount];
        var length = afr.TotalTime;
        int samplesRead = 0;
        while ((samplesRead = await afr.ReadAsync(buffer, 0, buffer.Length)) > 0)
        {
           await AddAudioRange(audio, buffer.Take(samplesRead).Select(x => x * multiplier));
        }
        return (audio.ToArray(), sampleRate, length.TotalSeconds);
   }
}

private async Task AddAudioRange(List<double> audio, IEnumerable<double> samplesRead)
{
    audio.AddRange(samplesRead);
}

swharden commented 3 years ago

This started happening when I started reading audio files and converting them asynchronously

This makes me think it indeed has to do with the async code (not a problem with the Spectrogram library itself). Is it possible the buffer byte array is not fully populated when AddAudioRange() is called? I suspect this may be causing your issue.

jwoodtke commented 3 years ago

You were totally right @swharden! I think another problem was the buffer array was in bytes rather than floats.

However, now the mel-spectrograms seem to be missing the bass component. Here is a sample image:

pop 00013MelSpec

Also, here is the new code for audio reads:

        private async Task<(double[] audio, int sampleRate, double length)> ReadWav(string file, double multiplier = 16000)
        {
            await using (var afr = new AudioFileReader(file))
            {
                int sampleRate = afr.WaveFormat.SampleRate;
                int sampleCount = (int) (afr.Length / afr.WaveFormat.BitsPerSample / 8);
                int channelCount = afr.WaveFormat.Channels;
                var audio = new List<double>(sampleCount);
                var buffer = new float[sampleRate * channelCount];
                var length = afr.TotalTime;
                int samplesRead = 0;
                while ((samplesRead = await PopulateBufferArray(afr, buffer)) > 0)
                {
                    await AddAudioRange(audio, buffer.Take(samplesRead).Select(x => x * multiplier));
                }
                return (audio.ToArray(), sampleRate, length.TotalSeconds);
            }
        }

        private async Task<int> PopulateBufferArray(AudioFileReader afr, float[] buffer)
        {
            return afr.Read(buffer, 0, buffer.Length);
        }
        private async Task AddAudioRange(List<double> audio, IEnumerable<double> samplesRead)
        {
            audio.AddRange(samplesRead);
        }

jwoodtke commented 3 years ago

Nevermind, I think the mel bin count I was using was too high. It's all iterative adjustment form here on out, thanks again for the help!