nshepperd / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
Other
1.15k stars 444 forks source link

Training from scratch? #11

Open bkj opened 5 years ago

bkj commented 5 years ago

I see that you provide code for finetuning the pretrained models -- do you think that this code is also appropriate for training a model from scratch? Or are there other repos that you think would be more appropriate for from-scratch training?

Thanks!

nshepperd commented 5 years ago

I don't see why not. Well, I suppose it depends exactly what you mean by "from scratch". For learning a many tasks I would probably start with the released GPT-2 anyway, and "fine tune" it to a completely different task (like generating C code), because if nothing else the released model works as a good initialization, with the correct scales for all parameters.

If you want to do something like use a different embedding/encoding for a different kind of data (like generating a vocabulary specifically for C code rather than english prose), that would certainly be possible too, though I haven't added anything yet to specifically support that.

If you want to not use the released model at all, for instance because you want to train a model with incompatible hyperparameters, it should be sufficient to just skip the restore from the released model checkpoint (around train.py:164-177) on your first run so the parameters will all be randomly initialized.

carter54 commented 5 years ago

@nshepperd I see the code generation is mentioned here. Did you try GPT2 on the code generation task? Thanks!