neulab / xnmt

eXtensible Neural Machine Translation
Other
185 stars 44 forks source link

Problem running GPU version on RTX 2070 with 8GB Memory #584

Closed rrmariani closed 4 years ago

rrmariani commented 4 years ago

I ran the test successfully without GPU. With GPU, cudaMalloc crashes, but Dynet detects successfully the GPU card and its memory.

How to fix this?

Here is the log:

(nmt) user@user-desktop:~/nmt/xnmt/recipes/stanford-iwslt$ xnmt --dynet-gpu ./config.en-vi.yaml [dynet] initializing CUDA [dynet] CUDA driver/runtime versions are 10.1/10.1 Request for 1 GPU ... [dynet] Device Number: 0 [dynet] Device name: GeForce RTX 2070 SUPER [dynet] Memory Clock Rate (KHz): 7001000 [dynet] Memory Bus Width (bits): 256 [dynet] Peak Memory Bandwidth (GB/s): 448.064 [dynet] Memory Free (GB): 7.7246/8.33533 [dynet] [dynet] Device(s) selected: 0 [dynet] random seed: 3842097158 [dynet] allocating memory: 512MB [dynet] memory allocation done. running XNMT revision ca4eac4 on user-desktop on 2019-12-31 15:35:56 => Running iwslt-experiment

use randomly initialized DyNet weights of all components DyNet param count: 11341853 Training Starting to read ./train.en and ./train.vi Done reading ./train.en and ./train.vi. Packing into batches. Done packing batches. The dy.parameter(...) call is now DEPRECATED. There is no longer need to explicitly add parameters to the computation graph. Any used parameter will be added automatically.

[iwslt-experiment] Epoch 0.0077: train_loss/word=7.530485 (words=31828, words/sec=11526.45, time=0-00:00:10)

etc...

[iwslt-experiment] Epoch 0.3658: train_loss/word=5.749202 (words=1219654, words/sec=18635.72, time=0-00:01:29) CUDA failure in cudaMalloc(&ptr, n) out of memory

Memory pool info for each devices: Device GPU:0 - FOR Memory 4192MB, BACK Memory 2464MB, PARAM Memory 192MB, SCRATCH Memory 128MB. Device CPU - FOR Memory 32MB, BACK Memory 32MB, PARAM Memory 32MB, SCRATCH Memory 32MB. CUDA is unable to allocate enough GPU memory on GPU:0, at current stage only 2 MB out of 7949 MB is free. Note due to hardware limitations not all free memories can be allocated.

rrmariani commented 4 years ago

I have changed the batch size to 32 instead of 64 and the system seems OK now...