Colab OOM Finetuning GPT-Neo both 125M and 350M on both T4 and P100

minimaxir / aitextgen

A robust Python tool for text-based AI training and generation using GPT-2.

https://docs.aitextgen.io

MIT License

1.84k stars 220 forks source link

Colab OOM Finetuning GPT-Neo both 125M and 350M on both T4 and P100 #121

Open redthing1 opened 3 years ago

redthing1 commented 3 years ago

Using Colab: I get OOM Finetuning GPT-Neo both 125M and 350M on both T4 and P100. Even when I enable fp16 this problem persists. GPT2 works fine on the other hand.

/usr/local/lib/python3.7/dist-packages/transformers/models/gpt_neo/modeling_gpt_neo.py in _attn(self, query, key, value, causal_mask, masked_bias, attn_dropout, attention_mask, head_mask)
    235 
    236         attn_weights = torch.matmul(query, key.transpose(-1, -2))
--> 237         attn_weights = torch.where(causal_mask, attn_weights, masked_bias.to(attn_weights.dtype))
    238 
    239         if attention_mask is not None:

RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 61.75 MiB free; 14.96 GiB reserved in total by PyTorch)

minimaxir commented 3 years ago

That's weird. Are you changing any other training settings?

redthing1 commented 3 years ago

That's weird. Are you changing any other training settings?

everything else is defaults. i tried again using a fresh copy of your notebook and 125M now works, but350M still OOMs.

johnnymcmike commented 3 years ago

Having this issue too. If it matters, I'm using a pretty large text file (~20 MB) as the dataset, and I'm also getting this warning a short while after training starts:

Token indices sequence length is longer than the specified maximum sequence length for this model (2385 > 2048). Running this sequence through the model will result in indexing errors

This also happened in my attempts to train GPT-Neo locally, so it doesn't seem like it's endemic to Colab.

redthing1 commented 3 years ago

Having this issue too. If it matters, I'm using a pretty large text file (~20 MB) as the dataset, and I'm also getting this warning a short while after training starts:

Token indices sequence length is longer than the specified maximum sequence length for this model (2385 > 2048). Running this sequence through the model will result in indexing errors

This also happened in my attempts to train GPT-Neo locally, so it doesn't seem like it's endemic to Colab.

That error just looks like one of your training samples has a token count that is too large, not the same as a GPU OOM. I recommend using the tokenizer function to find whatever sequence is causing that.

johnnymcmike commented 3 years ago

Alright, I'll check that out, but I am also definitely OOMing