Open iriscxy opened 6 years ago
Do the reward and loss become 'nan' all the time? At which step? The 2 pre-trained NMT models impact the result a lot. How about pre-training more or trying other data, and see what will happen?
I met the same problem... I tried to pre-train on 20% data and for 100 epoches, and tried on a better dataset but still failed, My tutor told me to change the learning rate, from 1e-3 to 1e-5, and it worked well longer than before, but still failed after about 200 steps, and now 1e-6 is running... So does a lower learning rate really useful? It seems 1e-3 will fail very quickly(about 50 steps), and lower the lr will make it later. And what does nan means? The language model fails? The NMT fails? Thanks for your patience :)
I finally found that I met the same problem as you,when it trys to generate words in beam(), the new_hyp_scores turn to nan at about 1000 steps, then I changed the learning rate, from 1e-3 to 1e-5 as suggested above,it worked well longer than before, I think the result shows that the nmt model must be train more,and next step I want to change the optimizer,such as adam. If you find some useful methods, please tell me how to do it.Thank you :)
We have tired adam before, but the result was bad. We think maybe the reason is that adam changes learning rate constantly. And the loss of translation is not smooth, that makes training process out of control, so the loss can't decrease.
After several steps(about 20), with learning rate of 1e-6(maybe small enough... 1e-3 is also tried, and loss turned to nan after 2 steps, even before saving a model...), the loss turns to nan again... I've tried to retrain the nmt model, for about 100w iters, with bleu of about 33.7 for modelA and 15.5 for modelB, but it just won't work... Does it means that whether the method works heavily depends on the data? Or the nmt model?
After several steps(about 20), with learning rate of 1e-6(maybe small enough... 1e-3 is also tried, and loss turned to nan after 2 steps, even before saving a model...), the loss turns to nan again... I've tried to retrain the nmt model, for about 100w iters, with bleu of about 33.7 for modelA and 15.5 for modelB, but it just won't work... Does it means that whether the method works heavily depends on the data? Or the nmt model?
I think the Nan problem comes from the reward calculation, because the reward is divided by std but std can be zero, so changing the reward form may solve the problem.
I also meet with this problem. Has anyone found any method to solve this?
As training moves on, the reward and loss all become 'nan'. Has this problem existed in your data? A -> B ('[s]', 'Old power means the fossil ##AT##-##AT## nuclear energies : oil , natural gas , coal and uranium exploited in centralised , monopolistic energy systems supported by short ##AT##-##AT## term thinking politics .') ('[smid]', '
Interaktion Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Fachkompetenz Schecks') r1= nan r2= nan rk= nan fw_loss= nan bw_loss= nan A loss = nan B loss = nan