Not converging due to learning rate alpha

zhongkaifu / RNNSharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

BSD 3-Clause "New" or "Revised" License

285 stars 91 forks source link

Not converging due to learning rate alpha #6

Closed bratao closed 8 years ago

bratao commented 8 years ago

Hello, Testing RNNSharp I was unable to make the model converge. No matter what setting I used. I changed two places and got the model to finally converge:

if (ppl >= lastPPL && lastAlpha != rnn.LearningRate)
 {
  //Although we reduce alpha value, we still cannot get better result.

I changed it to break only after we tried to lower alpha 8 times and failed to get a improvement.

I also changed this fragment: rnn.LearningRate = rnn.LearningRate / 2.0f;

To a decrease in a lower rate, in my case, 1.4.

Do any of those changes make sense ?

Do you think that a smarter learning rate annealing could improve RNNSharp, or am I looking at the wrong place ??

Thanks !!

zhongkaifu commented 8 years ago

First of all, learning ratio (alpha) is an open question for researching. There are many papers related to this problem. Some dynamitic learning ratio may have better result. And sometimes, the learning ratio strategy may be related to your data set.

Secondly, as you mentioned, the result isn't converge, but I want to make sure which corpus did you mention ? training corpus or validated corpus ? Since both CRF++ and CRFSharp don't use validated and test corpus, the better result in training corpus may cause overfitting in test corpus. So, my suggestion is that, it would be more reasonable if you create a test corpus and test CRF++/CRFSharp and RNNSharp encoded model quality on this corpus.

zhongkaifu commented 8 years ago

In addition, I have fixed the RNNSharp crashing bug when word embedding feature isn't enabled for SimpleRNN. You can get it by syncing the latest code from code base.

bratao commented 8 years ago

@zhongkaifu , I was testing the quality against a separated test corpus.

Yeah, I saw the latest commits, thank you so much !! It is also working without word embedding in LSTM-RNN for me. ( LSTM-RNN is faster and produces better results than SimpleRNN in my case)

One commit said that it would support to train model without validated corpus. However I'm not getting good results with it. I get better results ( in a separated testing corpus) using a validated corpus equal as the training corpus.

zhongkaifu commented 8 years ago

Yes. Validated corpus is usually required for training in order to get better hyper-parameters, but sometimes, we just use training corpus for some experiments. :)