librosa / librosa

Python library for audio and music analysis
https://librosa.org/
ISC License
7.09k stars 960 forks source link

Ability to save Spectrogram as image #1313

Closed amirhmk closed 3 years ago

amirhmk commented 3 years ago

Is your feature request related to a problem? Please describe. I am currently working with spectrograms of streaming 2-second clips to perform sound event detection. I am working on an embedded device, and have to be able to process those 2 second files in a short time.

Currently I am bottlenecked by plt.savefig() as a way of saving the spectrogram after using librosa.display.specshow(). This operation alone takes ~0.8 seconds, which is around 75% of the end-to-end process (audio loading to detection).

This is the gist of what I'm doing after recording the windows.

Describe the solution you'd like It would be great if librosa.display.specshow() or another function return the raw spectrogram as an image, which can then be saved via cv2/skimage/etc.

Describe alternatives you've considered I tried saving the raw data , but the visualization is not as rich and wouldn't be useful.

I can also increase the window size to be 5 seconds, but that would be detrimental to the accuracy of the detections.

Additional context Related issue which has been closed

lostanlen commented 3 years ago

Hello @amirhmk. Why do you need to save these spectrograms? couldn't you pass the NumPy arrays directly to your sound event detector, and spare the overhead of saving/loading?

amirhmk commented 3 years ago

@lostanlen That's a great point. I would rather not. I am using template matching - simplifying the process - as my way for event detection. So I am trying to match the sample spectrogram with something like this template:

eng_template_GT

The arrays that I get from amplitude_to_db does not quite match this format and is much more difficult to deal with. Do you think there is a way to still perform template matching without the output from the plot?

lostanlen commented 3 years ago

do you have access to the audio of your templates or do you only have their spectrograms? if you have the audio, the easiest solution is to recompute their spectrograms with the same pipeline as the librosa-based pipeline you're already developing

On a related note, with some colleagues from C4DM we have written a simple-minded template matching library in Librosa https://github.com/c4dm/dcase-few-shot-bioacoustic/tree/main/baselines/cross_correlation

amirhmk commented 3 years ago

I do have access to the audios. That does make sense, I just haven't been able to find another way that works as well as this! I basically manually crop the template from the original audio file, making sure it's in the same scale as the recordings.

Thank you so much for the code sample, this definitely looks promising, but I think there's way more data to deal with just the extracted template 😢 @lostanlen

turian commented 3 years ago

I wrote some code for this recently for a paper I was publishing. Here's the relevant code snippet, you can futz around with it. For example I wanted a gray colormap (for color blind readers) but you can change any of the options. I include a larger code sample so you see how you can adjust things:

        fig, ax = plt.subplots(figsize=set_size(0.225, 0.2))

        #plt.axis('off')
        img = librosa.display.specshow(D, x_axis='time',
#                            y_axis='log', hop_length=256, sr=SR,
                            y_axis='hz', hop_length=256, sr=SR,
                            fmax=MAX_HZ, ax=ax, cmap='gray')

        plt.xticks([0,1, 2])
        plt.ylim(30, 4200)
        if ylabels:
            plt.yticks([30, 1000, 2000, 3000, 4000])
        else:
            ax.get_yaxis().set_visible(False)
        ax.set_ylabel(None)
        ax.set_xlabel(None)

        plt.savefig("../output/final/figures/stft-%d-%s-ylabel-%s.pdf" % (f0hz, n_fft, ylabels))

        plt.plot()
amirhmk commented 3 years ago

@turian That's exactly what I'm doing, but plt.savefig takes 2 seconds (70% of the end-to-end loop) to resolve. It'd be great if img returned from specshow returned a numpy image or something, though from looking at the codebase I'm not sure whether this can be done with matplotlib?

turian commented 3 years ago

@amirhmk you can just take the spectrogram numpy matrix, and use Pillow to save it in grayscale as an image, if you care about speed

bmcfee commented 3 years ago

I'm going to close this issue out, as there's nothing to be done on the librosa side. Our spectrogram code already produces raw matrix data, and if you want to convert that to an image without going through matplotlib, it's possible but not something we would add as core functionality.