tobyyouup / conv_seq2seq

A tensorflow implementation of Fairseq Convolutional Sequence to Sequence Learning(Gehring et al. 2017)
Apache License 2.0
303 stars 118 forks source link

Training loss goes down and then go up #6

Open jingjing-gong opened 7 years ago

jingjing-gong commented 7 years ago

I am using part of your code, mainly conv_encoder_stack , to encode a sentence. then I found it weird that the training loss would go down at first then go up. why would training loss go up? does it have anything to do with the weight norm? do you have a theory on this?

harsh-agarwal commented 7 years ago

Have you changed the optimizer? What data are you training on?

jingjing-gong commented 7 years ago

I use AdamOptimizer, my first time to have observed a going up training loss, like from 1.2-> 0.4->1.0. And I have no idea why. It seems getting better when I lower the dropout rate. I am working on some new model on SNLI dataset :). Hope somebody know what's going on.

harsh-agarwal commented 7 years ago

Decreasing the dropout it gets better that means it's working as expected...so no worries it's all about hyper parameter tuning :)

How many epochs have you trained the network for and what's the batch size? my experience while using Adam last time was something like this...so it might just require patience.

img_0684

I had trained for almost 17 epochs :)

Hope this helps!

jingjing-gong commented 7 years ago

batch size set to 32, lr set to 0.0001. Trained like 10 epochs, but the update number is huge since the data is abundant. so according to your plot it's normal that training loss sometimes go up? I don't see my loss go up rapidly, but slowly and never went down again. do you think it is weight_norm to blame, or the *tf.sqrt(0.5)

there is my training log:

INFO 2017-09-20 21:59:53,107 Mean loss in 0th epoch is: 0.891537845135
INFO 2017-09-20 22:47:38,227 Mean loss in 1th epoch is: 0.707856357098
INFO 2017-09-20 23:35:30,202 Mean loss in 2th epoch is: 0.594837248325
INFO 2017-09-21 00:23:20,323 Mean loss in 3th epoch is: 0.538847446442
INFO 2017-09-21 01:11:41,955 Mean loss in 4th epoch is: 0.507788181305
INFO 2017-09-21 01:59:38,026 Mean loss in 5th epoch is: 0.499026745558
INFO 2017-09-21 02:47:03,504 Mean loss in 6th epoch is: 0.522499024868
INFO 2017-09-21 03:35:01,726 Mean loss in 7th epoch is: 0.620166659355
INFO 2017-09-21 04:22:40,464 Mean loss in 8th epoch is: 0.746133565903
INFO 2017-09-21 05:10:11,023 Mean loss in 9th epoch is: 1.05957996845
INFO 2017-09-21 05:57:23,671 Mean loss in 10th epoch is: 1.09116113186
harsh-agarwal commented 7 years ago

Hi,

Did you try decreasing the learning rate? That might just solve the issue as I had said...before the curve that I showed you my training curve was like this :p

screen shot 2017-09-22 at 1 52 24 am

And it might be helpful if you could print the loss after some iterations and sketch the validation along with the training as well :) Just gives a better picture

I had decreased the learning rate and that did the trick!

Can you elaborate a bit on the weight norm argument or the *tf.sqrt(0.5)?

I did not really get the reason for the *tf.sqrt(0.5)

Cheers Harsh

xxxzhi commented 6 years ago

@harsh-agarwal, My experience is same as JerrikEph. But why it is getting better when I lower the dropout rate when use adam optimizer?

xxxzhi commented 6 years ago

@JerrikEph under-fitting?

harsh-agarwal commented 6 years ago

If your dropout rate is high essentially you are asking the network to suddenly unlearn stuff and relearn it by using other examples.

Decreasing the drop out makes sure not many neurons are deactivated. So if you are able to train a network using less dropout then that's better.

xxxzhi commented 6 years ago

oh, Thanks! @harsh-agarwal