tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

Printed log shows /device:GPU:0 when in fact I am using CPU #312

Open shawnzaru opened 6 years ago

shawnzaru commented 6 years ago

The log message I got from training Luong's NMT is a bit misleading as the server I am using does not have GPU.

dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (2048, 4096), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (4096,), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (2048, 4096), /device:GPU:0
  dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (4096,), /device:GPU:0
  dynamic_seq2seq/decoder/memory_layer/kernel:0, (1024, 1024),
  dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (3072, 4096), /device:GPU:0
  dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (4096,), /device:GPU:0
  dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (2048, 4096), /device:GPU:0
  dynamic_seq2seq/decoder/attention/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (4096,), /device:GPU:0
  dynamic_seq2seq/decoder/attention/luong_attention/attention_g:0, (), /device:GPU:0
  dynamic_seq2seq/decoder/attention/attention_layer/kernel:0, (2048, 1024), /device:GPU:0
  dynamic_seq2seq/decoder/output_projection/kernel:0, (1024, 16793),
# log_file=/home/wangqing/bench/nmt/nt-u1024/log_1524133455
  created train model with fresh parameters, time 0.48s
  created infer model with fresh parameters, time 0.23s

I am also a bit confused with the fact that even when I intend to choose CPU training by specifying

export CUDA_VISIBLE_DEVICES=""

and added `--num_gpus=0, it will still try to put the embedding encoder and decoder into GPU. The following code for me is not good design as there is no way for a user to change its behavior from command-line level:

# model_helper.py
# If a vocab size is greater than this value, put the embedding on cpu instead
VOCAB_SIZE_THRESHOLD_CPU = 50000
...
def _get_embed_device(vocab_size):
  """Decide on which device to place an embed matrix given its vocab size."""
  if vocab_size > VOCAB_SIZE_THRESHOLD_CPU:
    return "/cpu:0"
  else:
    return "/gpu:0"
...