shawwn / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
MIT License
109 stars 36 forks source link

Loading large finetuned models? #14

Open JadynHax opened 4 years ago

JadynHax commented 4 years ago

I'm looking to load a finetuned 1.5B model to generate text (trained with a slightly modified version of the tpu-multi-snapshot branch), but it appears that there isn't support for that? The code in generate_unconditional_samples.py and interactive_conditional_samples.py only seems to support the pretrained models. I've tried messing with it myself in an attempt to load them (taking large sections from train.py), but nothing I do seems to be working.

Any chance anyone's succeeded in doing this already?

Norod commented 4 years ago

@JaonHax I'm using colab, so every-time I re-start it, I have to "continue from my own checkpoint" that's what I do and it seem to work well for me:

Screen Shot 2020-07-26 at 17 42 22
JadynHax commented 4 years ago

@Norod Oh, I know how to continue from a checkpoint. I just wanted to know if there was a better way of actually using it to generate text besides popping it into ./train.py, setting --learning_rate to 0 and --sample_every to 1, and running it that way. It just seems counterintuitive to do it like that.

By the way, I've now migrated over to the regular tpu branch (which I don't know why I hadn't done in the first place), which although having a different error, still has an error when trying to load my fine-tuned 1.5B model to use ./src/interactive_conditional_samples.py. I don't have the exact error to copy right now, but iirc it was something to do with the tensor shapes and Dimension0 being 1 vs. 10 somewhere. If needed, I can copy the exact error sometime.

Can also make a public copy of the notebook for people to take a look at if needed.