RAM usage during training

richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)

325 stars 42 forks source link

RAM usage during training #10

Closed Tiiiger closed 3 years ago

Tiiiger commented 3 years ago

hi @richarddwang

Thank you for putting up this repo. This is truly great work.

I want to ask the environment you used to run this code. How much CPU RAM did you use?

I tried to run the training with 50GB RAM but got OOM after 10K steps.

Is this expected?

richarddwang commented 3 years ago

Hi @Tiiiger, I wasn't aware of CPU RAM, but I can train it on a server with 4*16GB DDR4.

amritalok commented 3 years ago

@richarddwang Can you share your system setup? I am trying to train this on 2080ti but a batch size of 128 doesn't fit on the GPU. How were you able to run a batch size of 128?

richarddwang commented 3 years ago

For "ELECTRA-small", I can train it on a server that has

CPU: silver4116@2.10GHz2 (24 cores) Memory: DDR4 8GB12 GPU: GTX1080Ti*4 and use only one GPU of it.

The peak cuda memory is about 1G.

amritalok commented 3 years ago

Thanks! I am training on a cluster with 2080ti and I get the CUDA memory error. Any suggestions on what could be wrong? I am using the same setup as yours.

richarddwang commented 3 years ago

Sorry for the late reply. I retest the code and it still work. (cuda:3 used in the following result)

I haven't came up with anything that might get you cuda OOM error. But if in your case, I may

Run notebook and use ipdb to figure out which part/line results in error.
Run on Colab and see if it makes different. (Although I haven't tried on Colab)

amritalok commented 3 years ago

I did try on Colab which gave the memory error. I'll give it a shot with ipdb. Thanks a lot!

richarddwang commented 3 years ago

Please tag me if there is any found problem. Close the issue.