Open dayvidwang opened 6 months ago
What do you mean by "not converging"? Also if I remember correctly, you don't need the CALM model to have near-zero training loss, or even converging training loss to function. Maybe just follow the codebase and run the RL experiments and see the scores? The train/test losses of LMs are not that valuable.
I'm attempting to run the training script for the GPT-2 CALM on the ClubFloyd dataset, following the instructions from your EMNLP 2020 paper. I've set up my environment as recommended but am facing challenges with the training process.
Environment:
Python version: 3.6.15 Operating System: Ubuntu 20.04 GPU: Nvidia Titan RTX Dependencies: torch==1.4, transformers==2.5.1, jericho, fasttext, wandb, importlib_metadata
Issue:
The training doesn't perform as expected (training overfits to training data while validation performance hardly improves or worsens), even after adjusting hyperparameters like batch size and GPU count.
Attempts:
batch size = 1
batch size = 1
batch size = 15
batch size = 15
batch size = 12
Request:
Do you have any ideas on why these training runs might not be converging, whether it be hardware difference, hyperparameter difference, or something else?
Thank you for your time.