"Cuda Out of Memory" when running train.py

CiaraG98 commented 3 years ago

Is there a way to decrease the amount of memory needed when training?

I get: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 1.96 GiB total capacity; 1.55 GiB already allocated; 16.69 MiB free; 85.63 MiB cached)

when attempting to train with my own persona dataset. Decreasing the batch size and increasing the gradient_accumulation_step parameters does not make a difference. I even set the batch size to 1 and nothing, still not enough memory.

Does anyone have any other ideas?

CiaraG98 commented 3 years ago

@thomwolf would you be able to offer some tips? My dialog dataset is built from tweets so one tweet would be an utterance in the dataset. I have about 6 or 7 personas so it is not that large.

CiaraG98 commented 3 years ago

Running on CPU with batch_size=2 and gradient_accumulation_steps=4 seemed to do the trick.

huggingface / transfer-learning-conv-ai

"Cuda Out of Memory" when running train.py #103