weilinie / RelGAN

Implementation of RelGAN: Relational Generative Adversarial Networks for Text Generation
MIT License
119 stars 31 forks source link

Overfitting during Pretraining #13

Closed abhishek18124 closed 4 years ago

abhishek18124 commented 4 years ago

Did you encounter the overfitting problem during the generator pre-training? If yes, how did you mitigate it? Also, can you shed some light on the choice of the loss function used during pre-training, and how does it differ from TensorFlow implementation of CategoricalCrossEntropy?

weilinie commented 4 years ago

The loss function we used is the categorical cross entropy. In generator pre-training, I didn’t see the sign of model overfitting. Could you provide more details about how your pre-trained model overfits? I think one thing you can try is to reduce the memory size in generator.

abhishek18124 commented 4 years ago

I am using the RELGAN model on a custom dataset, which involves generating multiple outputs at each time step. Under this setting, during generator pre-training, I encountered overfitting after 10-15 epochs.

As far as loss during the pre-training is concerned, I was interested in knowing if there are any implementation differences between the one used in your code and the TensorFlow CategoricalCrossEntropy.

Also, can you tell me what criteria were you monitoring during the generator pre-training and how did it lead to you choosing 150 epochs?

weilinie commented 4 years ago

Thanks for providing more details. I didn’t go to details of the TF CategoricalCrossEntropy function, but from their description we are doing the same thing. I used BLEU and NLL_gen scores to monitor the training progress. As you can see in the paper, these scores stay pretty much the same at 100 epochs so we stopped the pre-training there. I think different datasets depending on its size, structure and length, usually require different pre-training epochs and model size. Maybe you can look into them and see if you can tune the training accordingly.