yandex / faster-rnnlm

Faster Recurrent Neural Network Language Modeling Toolkit with Noise Contrastive Estimation and Hierarchical Softmax
Other
561 stars 138 forks source link

LSTMs #18

Closed elmarhaussmann closed 8 years ago

elmarhaussmann commented 8 years ago

Thanks for the great software! I don't think there is something comparable that is able to deal with huge vocabularies and amount of data!

LSTMs have shown superior performance for language models in many recent papers. Would it make sense to add LSTMs as an available hidden unit or is there a particular reason this isn't implemented? Would implementing it be prone to any significant hurdles/effort in the current code?

akhti commented 8 years ago

LSTM and GRU have similar performance (e.g. see http://arxiv.org/pdf/1412.3555.pdf ). However GRU requires less matrices and therefore should be more stable for ASGD. So you can try to use it instead of LSTM: --hidden-type gru-full.

LSTM impelementation is not much harder, but it's just not here ;)

elmarhaussmann commented 8 years ago

Yes, thanks for the hint! I realized the same in the meantime. E.g., http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf also show that GRUs are at least as good or even better than LSTMs. They also apply dropout for larger models. I was trying to get close to the state-of-the-art on PTB but couldn't get test perplexity lower than ~110. It seems dropout helps there quite a bit (for the larger models). Any plans on implementing that? ;)