rolczynski / Automatic-Speech-Recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
GNU Affero General Public License v3.0
222 stars 62 forks source link

Log filterbank and keras version #5

Closed bagustris closed 5 years ago

bagustris commented 5 years ago

Hi,

I see you recently you change from MFCC to log filterbank for acoustic feature extraction. Is there any particular reason (improvement, etc)? Should it better to keep those two so user can choose and experimenting between those two feature extractions?

Also, as the main principle of your proposed platform is for easiness and understandable, isn't it better to keep to use Keras instead of tf.keras? I see it your to-do/contributing list.

rolczynski commented 5 years ago

hey!

Thank you very much for your questions! The log filter banks (logfbanks) work as good as MFCC (recent papers and in my experiments too). It is an excellent tutorial: https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html Especially: " Learning in speech systems, one might question if MFCCs are still the right choice given that deep neural networks are less susceptible to highly correlated input and therefore the Discrete Cosine Transform (DCT) is no longer a necessary step. It is beneficial to note that Discrete Cosine Transform (DCT) is a linear transformation, and therefore undesirable as it discards some information in speech signals which are highly non-linear."

I am open to discuss further. I try to keep this implementation simple without required extensions. Of course, you can overwrite this method.

The tf.Keras (mostly) is a refactored version of Keras. You should check the code under the hood.

bagustris commented 5 years ago

Could you provide %WER comparison between MFCC and Log filterbank?

rolczynski commented 5 years ago

Unfortunately, I have not done comprehensive research about the difference between MFCC and log filter bank