minimaxir / hacker-news-gpt-2

Dump of generated texts from GPT-2 trained on Hacker News titles
MIT License
117 stars 12 forks source link

What was the latest average loss? #1

Open iedmrc opened 5 years ago

iedmrc commented 5 years ago

Hi, In the readme file, it says:

Dump of generated texts from gpt-2-simple trained on Hacker News titles until April 25th, 2019 (about 603k titles, 30MB of text) for 36,813 steps (12 hours w/ a P100 GPU, costing ~$6). The output is definitely not similar to that of Markov chains.

My questions are:

  1. What was the latest avg_loss you've reached?
  2. Which model (117M or 345M) did you train it with?
  3. Which parameters (especially talking about learning rate) did you use?

Answers to these questions can give us more intuition while training the gpt-2-simple on our datasets.

Thanks for your answer!

turbo commented 5 years ago

FWIW, I've replicated this for another project with 100k steps on a P100. The final loss was somewhere between 0.85 and 0.98 IIRC. I trained on the 345M. The params were the same as in @minimaxir GPT2-simple Colab notebook.

iedmrc commented 5 years ago

Thanks for the answer! Was the result (samples it generated) satisfactory for you? How much did it take to train 100k steps on P100 in your case?

turbo commented 5 years ago

satisfactory

That term is pretty ambiguous. I certainly saw no significant improvements after about 50k steps. The coherence/funniness/uniqueness matched the sample batches in this repo more or less at that point. I saw slightly more interesting results when I mixed in titles from my medium125k dataset.

How much did it take to train 100k steps on P100 in your case?

I'm using a Scaleway P100 instance (1 EUR/h). It took me two days, though not with continuous learning. It did take about 12 to 14 hours for each 50k segments IIRC.

The T4 Colab env should suffice to train at a reasonable speed, though I have not yet found a way to get a T4 env reliably. I get K80s about half of the time creating a notebook. TPUs might be worth exploring.

iedmrc commented 5 years ago

Thanks for sharing your experiences!

minimaxir commented 5 years ago

Yeah, final loss slightly below 1.0 sounds around right.

FWIW in my work I don't really pay attention to the absolute value of loss; just whether if it's going down.

iedmrc commented 5 years ago

@minimaxir What about calculating validation loss? As I see, gpt-2-simple calculates loss but not validation_loss. What would you suggest to evaluate the trained model?