senarvi / theanolm

TheanoLM is a recurrent neural network language modeling tool implemented using Theano
Apache License 2.0
81 stars 29 forks source link

Decoding results #38

Closed amrmalkhatib closed 5 years ago

amrmalkhatib commented 6 years ago

I've trained a language model with the default training parameters except that I used hierarchical softmax and batch size =64, the training data was huge set of about one million document. I've converted each document into a single in the training file. Then I used this language model to decode lattices for spelling correction where I provide candidates for each spelling error, but the results are frustrating for me. When I compared the choices of the LM with the baseline (where I choose the most frequent word in my data from the candidates manually) the LM has only 7 to 8 percent more accurate results. I inspected the results and most of the time the LM is picking random words over other obvious frequent correct words. I believe that Theanolm is more robust and accurate than what I got, and I think that maybe I'm doing something wrong in the training phase, or there are better combination of the training parameters than what I've used.

senarvi commented 6 years ago

Maybe you should first forget lattice decoding and try to make sure that your model is working correctly. Inspect some very simple sentences. Use the score command to obtain word probabilities and see whether TheanoLM gives a higher probability to the "random" word than to the frequent word. Generally frequent words should get higher probabilities than very infrequent words. You can also train an n-gram LM using e.g. SRILM and see what probabilities it gives. I suggest also comparing n-gram and TheanoLM perplexities. If TheanoLM gives a higher perplexity, then there was almost certainly something wrong with the training hyperparameters.