Open nict-wisdom opened 4 years ago
Facing the same issue.
facing same issue. can someone share some answers for this?
Also seeing this issue. Monitoring GPU usage shows that only one GPU is being utilized when running BERT.
The current MNIST example is just using a single GPU in AMD/RocM platforms.
I can run the mnist example on a GPU. Does not appear to be utilizing CPU resources. However, when using 4 GPU, only the first device is actually utilized.
Hopefully we can get a developer response on this... ~I can't see what would need to be modified in mnist.py to make distributed GPU training work.~
EDIT: specifying your devices by name ['gpu:0, 'gpu:1', 'gpu:2']
instead of [''] * mesh_size
solves the problem for me
@PSZehnder Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to train on the entire (2 nodes *8 gpus )=16 GPUs. How do I configure mesh tensorflow to train in a multi node setup?
@nshazeer Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to train on the entire (2 nodes *8 gpus )=16 GPUs. How do I configure mesh tensorflow to train in a multi node setup?
Yes, that should be possible, though I haven't done it. The GPU code just relies on device placement, so if you can construct a TF graph which can name all of the 16 GPUs as different devices, it should work...
@nshazeer , Thanks for your reply. If I can make the 16 GPUs visible ,How the data loading will be done in a 2 node * 8 GPUs ? Will the data be loaded through 1 CPU in node0 ( where I run the script, so 1 CPU sends data to 16 GPUs) or the data loading will be done from the 2 cpus ( node0 and node1), so each CPU sends the data which is relevant to the 8 GPUs it its connected to. ?
@nict-wisdom do you have a snippet showing how you used the ProfilerHook
, I am a bit struggling with it atm.
Met the same problem, anyone on this team can reply this issue?
We are also facing the same issue. Any help in this context will be highly appreciated.
We tried to run Mesh-TensorFlow to train T5 on GPUs following the instructions on T5's repository, but the training is extremely slow.
The training script successfully detected GPUs (showing "Adding visible gpu devices: ..."), but most of computation seems to run on a CPU. By enabling log_device_placement, we can see many operators on both CPUs and GPUs. ProfilerHook showed that it actually uses both, but I couldn't know if the behavior is expected or not.
I am wondering if Mesh-TensorFlow runs on GPUs in a practical sense. I found an issue that mentioned a similar problem, but it was closed with no answer (#35).
I also failed to find reliable documents about training on multiple GPUs. An existing issue #20 mentioned the same question, but no answer was given.
I appreciate if someone could give us any information regarding the above questions.