zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.16k stars 1.18k forks source link

Multi-gpu slower than single-gpu #269

Open weiyx15 opened 3 years ago

weiyx15 commented 3 years ago

Hi, I found that with same hyper-parameters but different num_core_per_host (num_core_per_host=1 for single-gpu and num_core_per_host=6 for multi-gpu), global_step/sec of multi-gpu is slightly fewer than that of single-gpu. num_core_per_host=6:

INFO:tensorflow:global_step/sec: 1.09456
INFO:tensorflow:loss = 1.490116e-08, step = 401200 (91.361 sec)

num_core_per_host=1:

INFO:tensorflow:global_step/sec: 1.21364
INFO:tensorflow:loss = 0.053051353, step = 62400 (82.396 sec)

Is this phenomenon reasonable and why?

System Information: cuda V10.0.130 cudnn 7.4.1 nccl 2.6.4 tensorflow-gpu 1.13.1 (from pip in conda virtual environment)

Best Regards

guotong1988 commented 3 years ago

I guess multi-gpu's loss decreases faster then single-gpu.