yandex / faster-rnnlm

Faster Recurrent Neural Network Language Modeling Toolkit with Noise Contrastive Estimation and Hierarchical Softmax
Other
562 stars 138 forks source link

perplexity results #35

Closed vince62s closed 7 years ago

vince62s commented 8 years ago

Anton,

I am running the toolkit on the text from the Cantab-Tedlium recipe in Kaldi. it a text file of about 900 MB, vocab size 150K Anyway my question is: do you think it's normal to get a 114 perplexity in HS and 124 in NCE mode ? I would have expected the NCE results better than the HS ones according to your home page.

(parameters are the one from the WSJ recipe for rnnlm)

thanks Vincent

vince62s commented 8 years ago

to illustrate my point: Read the vocabulary: 149999 words Restoring existing nnet Constructing RNN: layer_size=400, layer_type=sigmoid, layer_count=1, maxent_hash_size=1999936667, maxent_order=4, vocab_size=149999, use_nce=0 Contructed HS: arity=2, height=28 Test entropy 6.834538 Perplexity is 114.13

Read the vocabulary: 149999 words Restoring existing nnet Constructing RNN: layer_size=400, layer_type=sigmoid, layer_count=1, maxent_hash_size=1999936667, maxent_order=4, vocab_size=149999, use_nce=1 Constructing NCE: layer_size=400, maxent_hash_size=1999936667, cuda=0, ln(Z)=9.000000 Use -nce-accurate-test to calculate entropy Perplexity is 123.375