readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.44k stars 218 forks source link

Bug in FFT windowing #286

Open tom-huntington opened 2 years ago

tom-huntington commented 2 years ago

You windowing for the fft when calculating the mfccs is incorrect, which will reduce performance a little (dtw seems very robust). The default runtime configuration results in frame_length = 1600 which is greater than fft_order = 512. However, this results in the last frame_length - fft_order = 1088 elements being chopped off the hamming window. You actual window is only the first third of the hamming window, which is a very bad choice of window.

https://github.com/readbeyond/aeneas/blob/4d200a050690903b30b3d885b44714fecb23f18a/aeneas/mfcc.py#L218

https://github.com/readbeyond/aeneas/blob/4d200a050690903b30b3d885b44714fecb23f18a/aeneas/mfcc.py#L230 besides the point, this should be

self.hamming_window = numpy.hamming(frame_length) 

Your padding is appended only after you do the windowing https://github.com/readbeyond/aeneas/blob/4d200a050690903b30b3d885b44714fecb23f18a/aeneas/mfcc.py#L192

Same with the c code.

https://github.com/readbeyond/aeneas/blob/4d200a050690903b30b3d885b44714fecb23f18a/aeneas/cmfcc/cmfcc_func.c#L519

https://github.com/readbeyond/aeneas/blob/4d200a050690903b30b3d885b44714fecb23f18a/aeneas/cmfcc/cmfcc_func.c#L571

https://github.com/readbeyond/aeneas/blob/4d200a050690903b30b3d885b44714fecb23f18a/aeneas/cmfcc/cmfcc_func.c#L575-L585

tom-huntington commented 2 years ago

It's hard to say what to change, for accuracy I imagine you would want keep the window_length the same so this requires a higher fft order, however for speed you would want to keep the fft_order down which I imagine is how this incorrectness came about.

I think the best solution is to do away with the windowing all together i.e. use the rectangular window as it's the widest