jameslyons / python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies.
MIT License
2.37k stars 618 forks source link

Memory Error for large audio files #16

Open nicolls1 opened 8 years ago

nicolls1 commented 8 years ago

22 minute audio file mfcc(signal, samplerate=16000, numcep=26, lowfreq=300, highfreq=4000, appendEnergy=True)

File "/vagrant/dossier/gsapi/memo/features/base.py", line 54, in mfcc feat,energy = fbank(signal,samplerate,winlen,winstep,nfilt,nfft,lowfreq,highfreq,preemph) File "/vagrant/dossier/gsapi/memo/features/base.py", line 80, in fbank frames = sigproc.framesig(signal, winlen_samplerate, winstep_samplerate) File "/vagrant/dossier/gsapi/memo/features/sigproc.py", line 55, in framesig return frames*win MemoryError

I am just calling it in batches for now to avoid this problem but might be something the library should better handle.

philipperemy commented 7 years ago

Are there any strong reasons why you cannot split this file into smaller files? How big is your 22-minute file in MB? Thanks

ainy commented 7 years ago

You can replace the line that ate your RAM: return frames*win with return frames

This multiplication does nothing, just a wast of RAM. No one seems to use this window function. It is called once with only one argument - frame_len which is a parameter to the calling function. So there is no need for this to be a function.

ainy commented 7 years ago

It would be best to make the default value winfunc=lambda frame_len, numframes: 1 and the last lines:

   win = winfunc(frame_len, numframes)
    return frames*win

To use window function in old way you pass winfunc=lambda frame_len, numframes: numpy.ones((numframes,frame_len))