Open tiesiogdvd opened 1 year ago
@tiesiogdvd The spectrum of 3 minutes is already huge. In business, it is generally obtained in real time, such as 128ms data every 32ms. For long audio, the business can perform segmentation processing, such as calculating it every 2-3 seconds, and finally splicing them together.
import audioflux as af
import numpy as np
data_arr, sr = af.read(af.utils.sample_path('220'))
obj = af.MelSpectrogram(num=128, samplate=sr, radix2_exp=12, slide_length=1024)
seg_size = int(0.512 * sr) # segment length
mel_data_list = []
for i in range(0, len(data_arr), seg_size):
start_idx = max(0, i - obj.fft_length + obj.slide_length)
end_idx = i + seg_size
_data_arr = data_arr[start_idx:end_idx]
if len(_data_arr) < obj.fft_length:
break
feature = obj.spectrogram(_data_arr)
feature = af.utils.power_to_db(feature)
mel_data_list.append(feature)
mel_data_arr = np.hstack(mel_data_list)
I am trying to use the library with JNI for android. Using example in issue #26 I have run into a problem where it uses a really high amount of memory. using spectrogramObj_spectrogram(...) method to generate Mel spectrograms. For a 3 minute audio track with 15 million samples and a slideLenght of 512 it allocates 1.4GB of memory at the start of spectrogram making process. The slide length appears to scale linearly with the ammount of memory used. Is there a more optimal way to get the melspecs without having such high memory allocations?