yandex / faster-rnnlm

Faster Recurrent Neural Network Language Modeling Toolkit with Noise Contrastive Estimation and Hierarchical Softmax
Other
562 stars 138 forks source link

Learning rate for 1B corpus #22

Open jhlau opened 8 years ago

jhlau commented 8 years ago

Hi, I am training a wikipedia corpus with 1B tokens, using sigmoid/gru with hidden count 1/2/3. The initial learning rate of 0.01 gave me pretty good results when I was working with 100M wikipedia, but for the 1B corpus after training a couple epochs both sigmoid/gru are starting to give me NaN entropy. Just curious, what are the learning rate that you used for the 1B benchmark corpus? I am now setting it to 0.001 and hopefully the gradients won't explode.