libAudioFlux / audioFlux

A library for audio and music analysis, feature extraction.
https://audioflux.top
MIT License
2.76k stars 118 forks source link

Large memory allocations #27

Open tiesiogdvd opened 1 year ago

tiesiogdvd commented 1 year ago

I am trying to use the library with JNI for android. Using example in issue #26 I have run into a problem where it uses a really high amount of memory. using spectrogramObj_spectrogram(...) method to generate Mel spectrograms. For a 3 minute audio track with 15 million samples and a slideLenght of 512 it allocates 1.4GB of memory at the start of spectrogram making process. The slide length appears to scale linearly with the ammount of memory used. Is there a more optimal way to get the melspecs without having such high memory allocations?

wtq2255 commented 12 months ago

@tiesiogdvd The spectrum of 3 minutes is already huge. In business, it is generally obtained in real time, such as 128ms data every 32ms. For long audio, the business can perform segmentation processing, such as calculating it every 2-3 seconds, and finally splicing them together.

import audioflux as af
import numpy as np

data_arr, sr = af.read(af.utils.sample_path('220'))
obj = af.MelSpectrogram(num=128, samplate=sr, radix2_exp=12, slide_length=1024)
seg_size = int(0.512 * sr)  # segment length

mel_data_list = []
for i in range(0, len(data_arr), seg_size):
    start_idx = max(0, i - obj.fft_length + obj.slide_length)
    end_idx = i + seg_size
    _data_arr = data_arr[start_idx:end_idx]
    if len(_data_arr) < obj.fft_length:
        break
    feature = obj.spectrogram(_data_arr)
    feature = af.utils.power_to_db(feature)
    mel_data_list.append(feature)

mel_data_arr = np.hstack(mel_data_list)