Open iedmrc opened 5 years ago
FWIW, I've replicated this for another project with 100k steps on a P100. The final loss was somewhere between 0.85 and 0.98 IIRC. I trained on the 345M. The params were the same as in @minimaxir GPT2-simple Colab notebook.
Thanks for the answer! Was the result (samples it generated) satisfactory for you? How much did it take to train 100k steps on P100 in your case?
satisfactory
That term is pretty ambiguous. I certainly saw no significant improvements after about 50k steps. The coherence/funniness/uniqueness matched the sample batches in this repo more or less at that point. I saw slightly more interesting results when I mixed in titles from my medium125k dataset.
How much did it take to train 100k steps on P100 in your case?
I'm using a Scaleway P100 instance (1 EUR/h). It took me two days, though not with continuous learning. It did take about 12 to 14 hours for each 50k segments IIRC.
The T4 Colab env should suffice to train at a reasonable speed, though I have not yet found a way to get a T4 env reliably. I get K80s about half of the time creating a notebook. TPUs might be worth exploring.
Thanks for sharing your experiences!
Yeah, final loss slightly below 1.0 sounds around right.
FWIW in my work I don't really pay attention to the absolute value of loss; just whether if it's going down.
@minimaxir What about calculating validation loss? As I see, gpt-2-simple calculates loss but not validation_loss. What would you suggest to evaluate the trained model?
Hi, In the readme file, it says:
My questions are:
Answers to these questions can give us more intuition while training the gpt-2-simple on our datasets.
Thanks for your answer!