Open weiyx15 opened 3 years ago
Hi, I found that with same hyper-parameters but different num_core_per_host (num_core_per_host=1 for single-gpu and num_core_per_host=6 for multi-gpu), global_step/sec of multi-gpu is slightly fewer than that of single-gpu. num_core_per_host=6:
num_core_per_host
num_core_per_host=1
num_core_per_host=6
global_step/sec
INFO:tensorflow:global_step/sec: 1.09456 INFO:tensorflow:loss = 1.490116e-08, step = 401200 (91.361 sec)
num_core_per_host=1:
INFO:tensorflow:global_step/sec: 1.21364 INFO:tensorflow:loss = 0.053051353, step = 62400 (82.396 sec)
Is this phenomenon reasonable and why?
System Information: cuda V10.0.130 cudnn 7.4.1 nccl 2.6.4 tensorflow-gpu 1.13.1 (from pip in conda virtual environment)
Best Regards
I guess multi-gpu's loss decreases faster then single-gpu.
Hi, I found that with same hyper-parameters but different
num_core_per_host
(num_core_per_host=1
for single-gpu andnum_core_per_host=6
for multi-gpu),global_step/sec
of multi-gpu is slightly fewer than that of single-gpu. num_core_per_host=6:num_core_per_host=1:
Is this phenomenon reasonable and why?
System Information: cuda V10.0.130 cudnn 7.4.1 nccl 2.6.4 tensorflow-gpu 1.13.1 (from pip in conda virtual environment)
Best Regards