About WMT German-English some questions

I am request 8 GPUs to run my job. But I found my job only running on 4 GPUs. Why? The cmd: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m nmt.nmt --src=de --tgt=en --hparams_path=nmt/standard_hparams/wmt16_gnmt_4_layer.json --out_dir=/tmp/deen_gnmt --vocab_prefix=/data/xinyx/wmt_data/vocab.bpe.32000 --train_prefix=/data/xinyx/wmt_data/train.tok.clean.bpe.32000 --dev_prefix=/data/xinyx/wmt_data/newstest2013.tok.bpe.32000 --test_prefix=/data/xinyx/wmt_data/newstest2015.tok.bpe.32000 --num_gpus=8

bug

Anther question: I am not get the speed improve from 1 gpu to 8GPUS on dgx1 P100.

I am using the default wmt16_gnmt_4_layer.json file . How can I get the speed-up improve for multi GPUs? Can you give me some advice?

tensorflow / nmt

About WMT German-English some questions #347