Closed Tiiiger closed 3 years ago
Hi @Tiiiger, I wasn't aware of CPU RAM, but I can train it on a server with 4*16GB DDR4.
@richarddwang Can you share your system setup? I am trying to train this on 2080ti but a batch size of 128 doesn't fit on the GPU. How were you able to run a batch size of 128?
For "ELECTRA-small", I can train it on a server that has
CPU: silver4116@2.10GHz2 (24 cores) Memory: DDR4 8GB12 GPU: GTX1080Ti*4 and use only one GPU of it.
The peak cuda memory is about 1G.
Thanks! I am training on a cluster with 2080ti and I get the CUDA memory error. Any suggestions on what could be wrong? I am using the same setup as yours.
Sorry for the late reply. I retest the code and it still work. (cuda:3 used in the following result)
I haven't came up with anything that might get you cuda OOM error. But if in your case, I may
I did try on Colab which gave the memory error. I'll give it a shot with ipdb. Thanks a lot!
Please tag me if there is any found problem. Close the issue.
hi @richarddwang
Thank you for putting up this repo. This is truly great work.
I want to ask the environment you used to run this code. How much CPU RAM did you use?
I tried to run the training with 50GB RAM but got OOM after 10K steps.
Is this expected?