For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition, out performing standard n-gram language models (LMs). For low resource languages, however, the performance of NNLMs has not been well explored. In this paper, we evaluate the effectiveness of NNLMs for low resource languages and show that NNLMs learn better word probabilities than state-of-theart n-gram models even when the amount of training data is severely limited. We show that interpolated NNLMs obtain a lower WER than standard n-gram models, no mater the amount of training data. Additionally, we observe that with small amounts of data (approx. 100k training tokens), feed-forward NNLMs obtain lower perplexity than recurrent NNLMs, while for the larger data condition (500k-1M training tokens), recurrent NNLMs can obtain lower perplexity than feed-forward models.
For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition, out performing standard n-gram language models (LMs). For low resource languages, however, the performance of NNLMs has not been well explored. In this paper, we evaluate the effectiveness of NNLMs for low resource languages and show that NNLMs learn better word probabilities than state-of-theart n-gram models even when the amount of training data is severely limited. We show that interpolated NNLMs obtain a lower WER than standard n-gram models, no mater the amount of training data. Additionally, we observe that with small amounts of data (approx. 100k training tokens), feed-forward NNLMs obtain lower perplexity than recurrent NNLMs, while for the larger data condition (500k-1M training tokens), recurrent NNLMs can obtain lower perplexity than feed-forward models.