Propose CNN approach to Keyword Spotting(KWS) task, which outperforms DNN approach by 27~44%
Experiments with limiting multiplies, limiting parameters
Details
KWS task that is performed in mobile devices must be accurate and fast.
Existing DNN architecture
40 dimension log-mel filterbank features every 25ms with 10ms frame shift as input
outputs filler(blank), answer and call
posterior handling to combine into single score
CNN Architecture
Good description of CNN
Typical CNN architectures
in-depth analysis of effects of varying parameters such as conv filter, strides, pooling in time/freq
Limit Multiplies
notice 500K compared to 9M in typical CNN
Limit Parameters
by pooling in time and frequency
Result
Pooling in Frequency
CNN performance improves as we increase the pooling size, and saturate when p = 3
Best CNN model outperforms DNN by 41%
Limiting Multiplies
Best performance with striding frequency filter with 50% overlap but no pool in frequency
Pooling in frequency is helpful, but should reduce features maps drastically to limit computation
Limiting Parameters
Stride in Time leads to worse performance
Pooling in Time improves performance. Pooling in time and therefore modeling the relationship between neighboring frames before sub-sampling, is more effective than striding in time which a-priori selects which neighboring frames to filter.
Personal Thoughts
Good experimental setup, understanding the effect of changing variables
Limiting in Parameters/Multiplies to achieve better model is a good application practice
Abstract
Details
KWS task that is performed in mobile devices must be accurate and fast.
Existing DNN architecture
CNN Architecture
Result
p = 3
Personal Thoughts
Link : https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43969.pdf Authors : Sainath et al. 2015