Closed gooofy closed 5 years ago
corpus is 433MB of articles (scraped from twitter) - is this enough/too little/too much?
That's a good start I think. GPT-2 was trained on a much much larger corpus, but 500MB should already be ok.
I am currently at epoch 31 - loss is decreasing very very slowly, currently at 6.7 - is this to be expected?
This looks a bit odd. Sorry for asking, but are you sure it's epoch 31? I don't remember that training code reported number of epochs, could it be something else? Also loss 6.7 looks high. For reference, below I'm showing some learning curves for a Russian corpora of around 4 GB of texts, with 16k or 50k vocab size, on X axis is number of tokens x 1e9, and on Y is loss, so it's much lower.
I am using gpt-2-tf-train - is this expected to work or should I switch to the torch one for better/faster results?
It's expected to work with a similar speed to pytorch one, pytorch one is a bit better developed and I plan to update only it, leaving TF as it is. For example, there is a web UI for pytorch one but not for TF, and probably more features related to training as well.
hey, thanks for the quick and detailed reply, appreciate it! :)
This looks a bit odd. Sorry for asking, but are you sure it's epoch 31? I don't remember that training code reported number of epochs, could it be something else?
maybe I am confused here already :o) this is what I am currently looking at:
epoch 31: 61%|█████████████████████████████████████████▌ | 10338/16914 [1:59:03<54:54, 2.00it/s, step=686900, loss=6.81, avg=6.74]
(I did a run of 3 epochs before this one hence I guess this is epoch 34 now). maybe this is different from the torch version of the training code (I will give that one a try very soon)
vocab size is 50k - but when I just checked it I noticed my text corpus could definitely need some cleanup, this is what I will work on now before I start another run.
for reference, here is what train + valid loss look like in tensorboard:
so I guess I am at 1.4e9 tokens now - and I should expect the loss to be much lower at this point.
once again thanks for all those hints, that should help me with my next steps. I will report when I have new results. If you notice anything in the results I have posted above, please let me know :)
Thanks for posting the curves - looks like it's indeed epoch 31, and the curves suggest that either it's some bug in tf training code, or a sub-optimal learning rate (too high?).
thanks again for the quick reply! :))
ok, I will definitely try torch training next (working on that setup right now :) )
I was suspicious about learning rate settings - I did not specify anything here, what lr would you recommend?
For pytorch code, I hope that default learning rate should be a good start, but I'll double-check parameters of the runs I referenced above tomorrow.
quick status update: using the torch training code and a cleaned up corpus (which also lead to a much nicer vocab set) things look much better now:
will work on a much larger german corpus next
thanks again for your kind support, helped me a lot!! :)
Wow, that looks nice!! Good luck with the bigger corpus 👍
First of all: thanks for your efforts here, highly appreciated!
I am wondering if you have any ballpark figures on how many epochs and how much training material is required to train a GTP-2 model from scratch?
In my case I am currently running an experiment training a german GPT-2 model and wondering if I am on the right track here. Here is what I have:
Thanks and keep up the good work!