microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.53k stars 4.28k forks source link

Poor word LM results on PTB #2799

Open VHellendoorn opened 6 years ago

VHellendoorn commented 6 years ago

Edit: I got the same results on CUDA 8 & 9, and the LightRNN example gives similar performance

The WordLMWithSampledSoftmax Example allows one to run a common "Small" RNN configuration (2 layer 200-dim LSTM) on the (10K vocabulary, ~1M tokens) PTB dataset. This is a typical setup, in which test/validation entropies should level out around 115 PPL (e.g. in Tensorflow). However, even with full softmax, this implementation doesn't come close to that, instead leveling out around 300 PPL. This remains true across a range of different configurations, such as without momentum, with simple SGD, lower/higher learning rates, different batch sizes etc..

There does not seem to be any other general language modeling tutorial/example or other issue addressing this, nor are the expected outputs on PTB for this example published. Are these results expected? What configuration would replicate other "Small" configs for the PTB dataset? Thanks!

VHellendoorn commented 6 years ago

Any comments on this? Is it a mistake on my part? I've preferred CNTK over Tensorflow so far, but this is preventing me from using it for language modeling tasks.