Too large loss value, both G and D

igul222 / improved_wgan_training

Code for reproducing experiments in "Improved Training of Wasserstein GANs"

MIT License

2.35k stars 668 forks source link

Too large loss value, both G and D #72

Open OwalnutO opened 6 years ago

OwalnutO commented 6 years ago

I reproduce the main code with tf.slim and try to train a wgan-gp model on my own dataset, image size is 128*128. But I get very large loss values during the training process, both G and D. Such as

d_loss: 514810.00000000, g_loss: -49040.92578125 d_loss: 76364.06250000, g_loss: -404828.46875000

And this problem starts very early. Could anyone give me some suggestions? Thanks!!!!!

linkAmy commented 5 years ago

Unfortunately I have the same problem, too. Do you solve it?

XingruiWang commented 4 years ago

Did you do normalization on your own dataset before training?

markemus commented 3 years ago

I have this same problem in tensorflow. Is the output layer of the discriminator really supposed to be linear? If I use tanh I get reasonable values (d_loss=[-2,0], g_loss=[-1,1]) and the model trains nicely.

123epsilon commented 3 years ago

I have this same problem in tensorflow. Is the output layer of the discriminator really supposed to be linear? If I use tanh I get reasonable values (d_loss=[-2,0], g_loss=[-1,1]) and the model trains nicely.

@markemus could you comment on how exactly you solved this? I'm having a similar issue with 3x256x256 images using PyTorch, what exactly did you change? My loss is pretty erratic.

markemus commented 3 years ago

I have this same problem in tensorflow. Is the output layer of the discriminator really supposed to be linear? If I use tanh I get reasonable values (d_loss=[-2,0], g_loss=[-1,1]) and the model trains nicely.

@markemus could you comment on how exactly you solved this? I'm having a similar issue with 3x256x256 images using PyTorch, what exactly did you change? My loss is pretty erratic.

I used tanh as an activation function on the output layer, and it worked well. But I suspect it slows down training.

123epsilon commented 3 years ago

I have this same problem in tensorflow. Is the output layer of the discriminator really supposed to be linear? If I use tanh I get reasonable values (d_loss=[-2,0], g_loss=[-1,1]) and the model trains nicely.

@markemus could you comment on how exactly you solved this? I'm having a similar issue with 3x256x256 images using PyTorch, what exactly did you change? My loss is pretty erratic.

I used tanh as an activation function on the output layer, and it worked well. But I suspect it slows down training.

Sorry, what exactly did you use tanh on? The critic output?

markemus commented 3 years ago

@123epsilon yes, I used tanh as an activation function on the critic output layer to map the critic output from (-inf, inf) to [-1, 1]. With that the model was stable and trained well, but without it it was very unstable and did not converge IIRC.

However I suspect that mapping the output to that narrow range may have slowed down training- if possible it might be better to solve this issue by adjusting learning rates or using a different optimizer, but I wasn't able to find a set of hyperparameters that worked. Tanh, on the other hand, worked beautifully.