Open ghost opened 6 years ago
I also have this problem.
I have this problem too. The perplexity will be NaN after about 10000 steps.
Did you try with a lower learning rate, like 0.1?
Lowering the learning rate fixed the problem. I think Adam with learning rate of 0.001 is better for GRU.
@RyanMolina but the optimizer is SGD?. So the problem is the optimizer of GRU can't defined as SGD or it's an another bug?
Hi,
I tried training my system with
GRU
unit instead ofLSTM
. Surprisingly,ppl
is increasing with every step. Check here:Major parameters used are: