Clarification of required GPU memory to pretrain?

Approximately how much GPU memory is required to pretrain? We're running on a single GPU but we're receiving the following error, even with batch size 1:

RuntimeError: CUDA out of memory. Tried to allocate 296.00 MiB (GPU 0; 10.76 GiB total capacity; 6.34 GiB already allocated; 206.56 MiB free; 9.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Do we just need to move to a GPU with more memory? Or are we doing something wrong?

tomekkorbak / pretraining-with-human-feedback

Clarification of required GPU memory to pretrain? #5