Cuda OOM even with batch size 1 and lower seq length on 16GB GPU

loubnabnl / santacoder-finetuning

Fine-tune SantaCoder for Code/Text Generation.

Apache License 2.0

184 stars 23 forks source link

Cuda OOM even with batch size 1 and lower seq length on 16GB GPU #16

Closed noobmldude closed 1 year ago

noobmldude commented 1 year ago

I tried the finetuning script on a single V100 GPU with 16GB GPU Memory and > 200GB VRAM. I still get CUDA OOM.

Should it be possible to finetune on a single V100 GPU? Am I doing something wrong? Any tricks to get it running is very much appreciated.

loubnabnl commented 1 year ago

Can you make sure you have gradient checkpointing turned on? You can also use half precision or reduce the context length to 1024 instead of 2048. You can check this Google Colab T4 for SantaCoder finetuning where it fits in a T4

bowencarry commented 7 months ago

I have the same problem, did you solve it？

loubnabnl commented 7 months ago

I recommend checking this code, which uses quantization and PEFT to reduce the memory footprint: https://github.com/bigcode-project/starcoder2/blob/main/finetune.py