Closed janmbuys closed 6 years ago
Update: Implemented the above (as ptb_main.py).
However, we still need to implement standard regularization and optimization techniques to get reasonable results:
Note that LSTM regularization techniques might not work for HMM as due to the hidden state bottleneck it will have a harder time memorizing the training data.
I hope there's an error in the perplexity?
Update: Implemented dropout and basic SGD optimization strategies.