Out of Memory Error - Githubissues

slavik0329 commented 5 years ago

I get the below errors when running gpt-2. The model runs in the end and seems to work, but is there any way to fix this?

Thanks!

2019-11-19 17:13:26.908876: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of m
emory

2019-11-19 17:13:26.913409: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 4294967296

2019-11-19 17:13:26.917233: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 3865470464 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of m
emory

2019-11-19 17:13:26.921787: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 3865470464
2019-11-19 17:13:26.925608: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 3478923264 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of m
emory

2019-11-19 17:13:26.930056: W .\tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 3478923264

LoganDark commented 5 years ago

You need more GPU memory.

Lower your batch size or, if it's already 1, you'll have to use CPU / switch to a smaller model.

slavik0329 commented 5 years ago

I’m using a GTX 2080ti. Does that make sense with this card?

On Wed, Nov 20, 2019 at 11:02 AM LoganDark notifications@github.com wrote:

You need more GPU memory.

Lower your batch size or, if it's already 1, you'll have to use CPU / switch to a smaller model.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openai/gpt-2/issues/213?email_source=notifications&email_token=AAC4SBA4XAT2H2N3T7P5Y6TQUVNRVA5CNFSM4JPKIABKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESQE6A#issuecomment-556073592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC4SBGN3FKU5LLWMHRZZQ3QUVNRVANCNFSM4JPKIABA .

LoganDark commented 5 years ago

In my opinion you should really be using a workstation card rather than a gaming card.

Personally I've always had to use CPU since I haven't been graced with the presence of the Nvidia Gods and my laptop has an AMD GPU. Dedicated, but it's AMD so TF doesn't like it.

CPU training works just fine (I get about one iteration every 30 seconds), it's just slower and tends to bring down the rest of your system too. :/

Teravus commented 4 years ago

I know this is old, but I want to point out that you can run GPT-2 interference on a GTX 2080ti in Linux... but not Windows. The reason for this is.. Windows reserves a portion of the VRAM for displaying things in the WDDM 2 driver /even if you have no monitors hooked up to it/. This is out of your control.. and you can't switch to the compute only drivers on gaming cards. On the 2080ti, it ends up being ~1GB of VRAM. On Linux, a much smaller amount of VRAM is reserved depending on what you actually have running. You can barely fit the largest GPT-2 Model in an 11GB Card.. so that 1GB of reserved VRAM in Windows puts it over the edge.

sowich commented 3 years ago

Is it possible to somehow increase GPU memory? There is a lot of free RAM! Trying to train a model 1558M on the video card RTX2080TI, but memory errors come out!

Wyldhunt commented 3 years ago

@sowich GPU memory is built in to your GPU, and can't be upgraded. If you need more, your only options are to purchase a GPU with more memory, or purchase a second GPU, identical to your existing GPU, and run them both in SLI (assuming that your pc is SLI capable). Your RAM is used your CPU. If you run on your CPU instead of your GPU, you'll likely see a large spike in your RAM usage.

openai / gpt-2

Out of Memory Error #213