tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.35k stars 1.96k forks source link

GPU not fully utilized #476

Open nashid opened 4 years ago

nashid commented 4 years ago

I have running the training with the following command:

python -m nmt.nmt \ 
--src=vi --tgt=en \ 
--vocab_prefix=/tmp/nmt_data/vocab \ 
--train_prefix=/tmp/nmt_data/train \ 
--dev_prefix=/tmp/nmt_data/tst2012 \ 
--test_prefix=/tmp/nmt_data/tst2013 \ 
--out_dir=/tmp/nmt_model \ 
--num_train_steps=12000 \ 
--steps_per_stats=100 \ 
--num_layers=2 \
 --num_units=128 \ 
--dropout=0.2 \ 
--metrics=bleu \
--nums_gpu=1

I have one GPU (GPU Radeon RX 580). Upon running the experimentation, I see the CPUs are fully utilized and the GPU usage remains insignificant(<5%).

I saw this in the log:

Devices visible to TensorFlow: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456)]

Can anyone provide any pointer why GPU usage remains low?

neel04 commented 3 years ago

@nashid You haven't installed CUDA, which is required to run TensorFlow ops in GPU and utilize its computational cores. However, CUDA is proprietary to Nvidia (whereas AMD uses OpenCL). So in a nutshell, you need CUDA for running TensorFlow, and to run CUDA, you have to have an Nvidia GPU. If you are not willing to buy a GPU, you can always use "Colab" by Google which provides free GPU resources.

Since TF cannot find your GPU, it automatically switches to CPU which takes a lot of time to train on. Hence the high CPU usage and low GPU usage