Closed jwoodtke closed 3 years ago
Hi @jwoodtke,
This looks good to me. Is your input audio a single continuous combination of tones that does not change over time? If so, your spectrogram will be identical (Y) at every time point (X) and appear similar to the screenshot you posted.
If not, can you attach a WAV of MP3 of your audio file here so I can take a closer look?
Thanks! Scott
Hey thanks for getting back so quick! @swharden
I am using songs from the GTZAN wav dataset so they are varying tone audio signals (I can't post audio files on here, but I can add you to a google drive file or something though).
This started happening when I started reading audio files and converting them asynchronously, I'm not sure if that's relevant but if it is, here is my corresponding async audio reading code:
private async Task<(double[] audio, int sampleRate, double length)> ReadWav(string file, double multiplier = 10000)
{
await using (var afr = new AudioFileReader(file))
{
int sampleRate = afr.WaveFormat.SampleRate;
int sampleCount = (int) (afr.Length / afr.WaveFormat.BitsPerSample / 8);
int channelCount = afr.WaveFormat.Channels;
var audio = new List<double>(sampleCount);
var buffer = new byte[sampleRate * channelCount];
var length = afr.TotalTime;
int samplesRead = 0;
while ((samplesRead = await afr.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await AddAudioRange(audio, buffer.Take(samplesRead).Select(x => x * multiplier));
}
return (audio.ToArray(), sampleRate, length.TotalSeconds);
}
}
private async Task AddAudioRange(List<double> audio, IEnumerable<double> samplesRead)
{
audio.AddRange(samplesRead);
}
This started happening when I started reading audio files and converting them asynchronously
This makes me think it indeed has to do with the async code (not a problem with the Spectrogram library itself). Is it possible the buffer
byte array is not fully populated when AddAudioRange()
is called? I suspect this may be causing your issue.
You were totally right @swharden! I think another problem was the buffer array was in bytes rather than floats.
However, now the mel-spectrograms seem to be missing the bass component. Here is a sample image:
Also, here is the new code for audio reads:
private async Task<(double[] audio, int sampleRate, double length)> ReadWav(string file, double multiplier = 16000)
{
await using (var afr = new AudioFileReader(file))
{
int sampleRate = afr.WaveFormat.SampleRate;
int sampleCount = (int) (afr.Length / afr.WaveFormat.BitsPerSample / 8);
int channelCount = afr.WaveFormat.Channels;
var audio = new List<double>(sampleCount);
var buffer = new float[sampleRate * channelCount];
var length = afr.TotalTime;
int samplesRead = 0;
while ((samplesRead = await PopulateBufferArray(afr, buffer)) > 0)
{
await AddAudioRange(audio, buffer.Take(samplesRead).Select(x => x * multiplier));
}
return (audio.ToArray(), sampleRate, length.TotalSeconds);
}
}
private async Task<int> PopulateBufferArray(AudioFileReader afr, float[] buffer)
{
return afr.Read(buffer, 0, buffer.Length);
}
private async Task AddAudioRange(List<double> audio, IEnumerable<double> samplesRead)
{
audio.AddRange(samplesRead);
}
Nevermind, I think the mel bin count I was using was too high. It's all iterative adjustment form here on out, thanks again for the help!
Hi, I'm trying to troubleshoot an error with my mel-spectrogram image generation. The saved .png image looks like the following:
Here is the code I am using:
where:
FFTSize = 2048
StepSize = 300
MaxFreq = 3000
Intensity = 5
MelBinCount = 200
SaveAsDb = false
Also, I am using the ReadWAV method as it is shown in the readme to read .wav audio inputs.
Any help would be appreciated.