Open bkj opened 5 years ago
I don't see why not. Well, I suppose it depends exactly what you mean by "from scratch". For learning a many tasks I would probably start with the released GPT-2 anyway, and "fine tune" it to a completely different task (like generating C code), because if nothing else the released model works as a good initialization, with the correct scales for all parameters.
If you want to do something like use a different embedding/encoding for a different kind of data (like generating a vocabulary specifically for C code rather than english prose), that would certainly be possible too, though I haven't added anything yet to specifically support that.
If you want to not use the released model at all, for instance because you want to train a model with incompatible hyperparameters, it should be sufficient to just skip the restore from the released model checkpoint (around train.py:164-177
) on your first run so the parameters will all be randomly initialized.
@nshepperd I see the code generation is mentioned here. Did you try GPT2 on the code generation task? Thanks!
I see that you provide code for finetuning the pretrained models -- do you think that this code is also appropriate for training a model from scratch? Or are there other repos that you think would be more appropriate for from-scratch training?
Thanks!