Closed amirhmk closed 3 years ago
Hello @amirhmk. Why do you need to save these spectrograms? couldn't you pass the NumPy arrays directly to your sound event detector, and spare the overhead of saving/loading?
@lostanlen That's a great point. I would rather not. I am using template matching - simplifying the process - as my way for event detection. So I am trying to match the sample spectrogram with something like this template:
The arrays that I get from amplitude_to_db
does not quite match this format and is much more difficult to deal with. Do you think there is a way to still perform template matching without the output from the plot?
do you have access to the audio of your templates or do you only have their spectrograms? if you have the audio, the easiest solution is to recompute their spectrograms with the same pipeline as the librosa-based pipeline you're already developing
On a related note, with some colleagues from C4DM we have written a simple-minded template matching library in Librosa https://github.com/c4dm/dcase-few-shot-bioacoustic/tree/main/baselines/cross_correlation
I do have access to the audios. That does make sense, I just haven't been able to find another way that works as well as this! I basically manually crop the template from the original audio file, making sure it's in the same scale as the recordings.
Thank you so much for the code sample, this definitely looks promising, but I think there's way more data to deal with just the extracted template 😢 @lostanlen
I wrote some code for this recently for a paper I was publishing. Here's the relevant code snippet, you can futz around with it. For example I wanted a gray colormap (for color blind readers) but you can change any of the options. I include a larger code sample so you see how you can adjust things:
fig, ax = plt.subplots(figsize=set_size(0.225, 0.2))
#plt.axis('off')
img = librosa.display.specshow(D, x_axis='time',
# y_axis='log', hop_length=256, sr=SR,
y_axis='hz', hop_length=256, sr=SR,
fmax=MAX_HZ, ax=ax, cmap='gray')
plt.xticks([0,1, 2])
plt.ylim(30, 4200)
if ylabels:
plt.yticks([30, 1000, 2000, 3000, 4000])
else:
ax.get_yaxis().set_visible(False)
ax.set_ylabel(None)
ax.set_xlabel(None)
plt.savefig("../output/final/figures/stft-%d-%s-ylabel-%s.pdf" % (f0hz, n_fft, ylabels))
plt.plot()
@turian That's exactly what I'm doing, but plt.savefig
takes 2 seconds (70% of the end-to-end loop) to resolve. It'd be great if img
returned from specshow returned a numpy image or something, though from looking at the codebase I'm not sure whether this can be done with matplotlib?
@amirhmk you can just take the spectrogram numpy matrix, and use Pillow to save it in grayscale as an image, if you care about speed
I'm going to close this issue out, as there's nothing to be done on the librosa side. Our spectrogram code already produces raw matrix data, and if you want to convert that to an image without going through matplotlib, it's possible but not something we would add as core functionality.
Is your feature request related to a problem? Please describe. I am currently working with spectrograms of streaming 2-second clips to perform sound event detection. I am working on an embedded device, and have to be able to process those 2 second files in a short time.
Currently I am bottlenecked by
plt.savefig()
as a way of saving the spectrogram after usinglibrosa.display.specshow()
. This operation alone takes ~0.8 seconds, which is around 75% of the end-to-end process (audio loading to detection).This is the gist of what I'm doing after recording the windows.
Describe the solution you'd like It would be great if
librosa.display.specshow()
or another function return the raw spectrogram as an image, which can then be saved via cv2/skimage/etc.Describe alternatives you've considered I tried saving the raw data , but the visualization is not as rich and wouldn't be useful.
I can also increase the window size to be 5 seconds, but that would be detrimental to the accuracy of the detections.
Additional context Related issue which has been closed