Fine Tuning GPT2-medium

minimaxir / aitextgen

A robust Python tool for text-based AI training and generation using GPT-2.

https://docs.aitextgen.io

MIT License

1.84k stars 218 forks source link

Fine Tuning GPT2-medium #71

Open AdaUchendu opened 3 years ago

AdaUchendu commented 3 years ago

Trying to fine-tune the GPT2 medium model and get the error: _TypeError: init() got an unexpected keyword argument 'show_progressbar'

When I install pytorch-lightning==0.8.4 like suggested in the other related issues I get the following error: RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 15.90 GiB total capacity; 14.94 GiB already allocated; 33.88 MiB free; 14.96 GiB reserved in total by PyTorch)

Also installing fire==0.3.0 like suggested in another related issue also returns the RuntimeError above.

Please, how can I get around these issues? The suggested solutions - Installing the following below does not work. !pip3 install pytorch-lightning==0.7.6 !pip3 install transformers==2.9.1 !pip3 install fire==0.3.0

Thank you.

minimaxir commented 3 years ago

GPT2-Medium only seems possible with FP16 training so far even with the fixes I'll be adding in 0.3.0.

Ideally there needs to be a more correct gradient accumulation like in gpt-2-simple but that's not easy to do. (#6)

AdaUchendu commented 3 years ago

Hi Max,

Thank you for the suggestion. I tried using gpt-2-simple to finetune gpt2-large and gpt2-xl. It worked for gpt2-large but my colab notebook always times out within less than 4 hours of finetuning gpt2-xl. Also, when I start to generate texts with the finetuned gpt2-large model, my colab notebook always times out within an hour. I have tried this multiple times and the same thing happens. Please do you have any suggestions of what I can do to get around these issues? I will be content with at least getting the gpt2-large finetuned model to just generate.

Thank you.

minimaxir commented 3 years ago

I'm less of an expert working with gpt2-xl unfortunately.

The best way to avoid timeouts would be to pay for a dedicated GPU via Google Cloud Platform or use Colab Pro. Unfortunately it's more of a you-get-what-you-pay-for. (which is why I did a fresh start with aitextgen: to see if working with models can be more efficient.)

AdaUchendu commented 3 years ago

I see your point. I have Colab Pro which is why the time-outs are kind of odd. By the way, the aitextgen works very well for gpt2-small, and gpt2 medium. It is very efficient.

jgoodrich77 commented 3 years ago

@AdaUchendu I've used this repo for a while now in Google Colab Pro to fine tune the XL (1558M) GPT-2 model without issue for a few months: https://github.com/drfinkus/gpt-2-simple

It's a port of the minimaxir package, just with a few updates that ensure it'll fine tune without error over the larger model.

AdaUchendu commented 3 years ago

@AdaUchendu I've used this repo for a while now in Google Colab Pro to fine tune the XL (1558M) GPT-2 model without issue for a few months: https://github.com/drfinkus/gpt-2-simple

It's a port of the minimaxir package, just with a few updates that ensure it'll fine tune without error over the larger model.

Thank you, Jeremy. Do you have an idea of how you to continue training with saved checkpoints after Google colab times out?

redthing1 commented 3 years ago

This is now fully working in 0.4.1!