masakhane-io / masakhane-reading-group

Agile reading group that works
13 stars 1 forks source link

[18/06/2020] 5:15PM GMT+1 : Neural Network Language Models for Low Resource Languages #8

Closed keleog closed 4 years ago

keleog commented 4 years ago

For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition, out performing standard n-gram language models (LMs). For low resource languages, however, the performance of NNLMs has not been well explored. In this paper, we evaluate the effectiveness of NNLMs for low resource languages and show that NNLMs learn better word probabilities than state-of-theart n-gram models even when the amount of training data is severely limited. We show that interpolated NNLMs obtain a lower WER than standard n-gram models, no mater the amount of training data. Additionally, we observe that with small amounts of data (approx. 100k training tokens), feed-forward NNLMs obtain lower perplexity than recurrent NNLMs, while for the larger data condition (500k-1M training tokens), recurrent NNLMs can obtain lower perplexity than feed-forward models.

keleog commented 4 years ago

Update: Was discussed on 25/06/2020 because no meeting held on 18/06/2020