Closed rvinas closed 4 years ago
@jsyoon0823 do you know why this happens? Your input would be appreciated
Hello,
The default hyper-parameters should be optimized for each dataset. GAN training needs a little more considering for optimizing hyper-parameters such as iterations, batch size, and hint rate. Keep checking whether the discriminator and generator are well-balanced as well. With some hyper-parameter optimizations, I can achieve RMSE 0.0513 which cannot be achieved by MSE loss only.
@jsyoon0823 thank you for the suggestions. What would be a good choice of hyperparameters for the Spam dataset?
I am mostly struggling with getting decent performance without the supervised loss. For example, when I set alpha=0
, in the best case scenario I only get an RMSE of ~0.2 for the Spam dataset (far from the reported average RMSE of ~0.07). Do you have any suggestions on how to tune the hyperparameters for this particular case? Thanks a lot for your help
Without supervised loss, you need to control the GAN training more seriously. Supervised loss has some regularization effects; thus, it can stabilize the GAN training. However, without this supervised loss, GAN training is a little more unstable.
In this case, you need to do some early stopping (or best model saving) with the criteria of supervised loss. Even though you do not directly use the supervised loss for training the model, you can use it for early stopping. It will make you achieve the reported performance.
Hi Yoon,
I was wondering how the results from table 1 were obtained. I have been playing around with the code and to me, it doesn't seem clear that the adversarial loss is helping (as reported in the results section, concretely table 1).
For example, when I run the code for the SPAM dataset (default implementation and hyperparameters) the RMSE score is ~0.053. However, when I set the adversarial loss to 0 by modifying the following line in
gain.py
:G_loss_temp = -tf.reduce_mean((1-M) * tf.log(D_prob + 1e-8)) * 0.
the RMSE score is also ~0.053. Am I missing anything? I am observing something similar for another dataset. I am considering using GAIN for a project and I would greatly appreciate an explanation on how the results of table 1 were obtained.