Closed ghost closed 3 years ago
Hello @romanshrestha17-iv,
Thank you for posting this issue and moving the conversation here! Is the file you're using to generate these images public?
Thanks, Christian
cc @mthrok @vincentqb
Hi @cpuhrsch ,
Thank you for your prompt response. Yes, I tried to generate spectrograms for some of the files from the kaggle dataset to just test out how the spectrograms would look.
Now we are moving onto a confidential dataset soon, but if its able to generate spectrograms that look as good as Librosa that would be great. Please find the link to dataset below.
Thanks for the opening the issue! I'm getting very similar MelSpectrogram here with that dataset, do you have a code so we can reproduce?
@romanshrestha17-iv If it's a manual inspection, I'd say you should just stick with librosa which has many default values set up nicely. Otherwise, check out options like i) using mel-spectrogram instead of STFT ii) use decibel scaling iii) clamp the input of decibel scaling (check out how librosa does), or similarly, do linear_to_decibel(1 + abs(melspectrogram))
.
@romanshrestha17-iv If it's a manual inspection, I'd say you should just stick with librosa which has many default values set up nicely. Otherwise, check out options like i) using mel-spectrogram instead of STFT ii) use decibel scaling iii) clamp the input of decibel scaling (check out how librosa does), or similarly, do
linear_to_decibel(1 + abs(melspectrogram))
.
@keunwoochoi Thanks for the suggestion. I've tried all of them out and still there is not much difference with torch audio spectrograms. I'm using librosa at the moment. I'm particularly interested in torchaudio because GPU's could help accelerate the spectrogram generation process.
Thanks for the opening the issue! I'm getting very similar MelSpectrogram here with that dataset, do you have a code so we can reproduce?
Thank you for investigating this issue Vincent. a snippet of the code used to generate log-mel spectrogram from librosa is here:
def scale_minmax(X, min=0.0, max=1.0):
X_std = (X - X.min()) / (X.max() - X.min())
X_scaled = X_std * (max - min) + min
return X_scaled
def spectrogram_image(y, sr, out, hop_length, n_mels):
# use log-melspectrogram
mels = lr.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels,
n_fft=hop_length*2, hop_length=hop_length)
mels = np.log(mels + 1e-9) # add small number to avoid log(0)
# min-max scale to fit inside 8-bit range
img = scale_minmax(mels, 0, 255).astype(np.uint8)
img = np.flip(img, axis=0) # put low frequencies at the bottom in image
img = 255-img # invert. make black==more energy
# save as PNG
skimage.io.imsave(out, img)
--------------------------------------------------
files=len(audio_f)
spk_ID = [audio_f[i].split('/')[-1].lower() for i in range(files)]
for i in range(files):
if __name__ == '__main__':
# settings
hop_length = 512 # number of samples per time-step in spectrogram
n_mels = 128 # number of bins in spectrogram. Height of image
time_steps = 384 # number of time-steps. Width of image
# load audio.
y, sr = lr.load(audio_f[i])
out = "{}.png".format(spk_ID[i])
# .format(audio_f[file])
# extract a fixed length window
start_sample = 0 # starting at beginning
length_samples = time_steps*hop_length
window = y[start_sample:start_sample+length_samples]
# convert to PNG
spectrogram_image(window, sr=sr, out=out, hop_length=hop_length, n_mels=n_mels)
print('done!')
@romanshrestha17-iv Could you also please post the code you used to generate the torchaudio image?
@cpuhrsch the code use to generate torchaudio image is the similar to what @vincentqb has shared in his notebook here
We have updated the tutorial with how to generate a MelSpectrogram which is numerically comparable with librosa. Please checkout the new tutorial. https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html#id1
Up to release version 0.8, torchaudio could only generate the equivalent of librosa's htk=True
spectrograms, but recently we also added the support for htk=False
too. We will follow up on this in the next release.
Hi,
I am highly interested in generating spectrograms in GPU, However, the spectrograms generated using torch audio do not seem to be of that good quality compared to the spectrograms generated by Librosa. However, Librosa doesn't run on GPU.
To make it more clear please find the attached spectrograms.
For torchaudio:
For Librosa: