[Question] Does MindSpore support multi-GPU with auto-parallel strategy?

FudanEMWLab commented 4 years ago

I built the source from a docker environment based on the dockerfile (docker/mindspore-gpu/devel/Dockerfile) and tried some tests under mindspore/tests/ut/python/parallel. I've modified the tests by adding below two lines,

context.set_context(mode=context.GRAPH_MODE, device_target="GPU") init('nccl')

I used below command to build the source,

bash build.sh -e gpu -M on -z pip install build/package/mindspore_gpu-0.1.0-cp37-cp37m-linux_x86_64.whl

I checked the folder where the package were installed, the libgpu_collective.so which was not loaded successfully during nccl initialization is in there.

The tests failed with below error messages. Is there any guide to run MindSpore with multi-GPU and different parallel modes?

Thanks

################## Error Message #################### elif backend_name == "nccl":

      init_gpu_collective()
E RuntimeError: mindspore/ccsrc/device/gpu/distribution/collective_init.cc:35 InitCollective] Loading libgpu_collective.so failed. Many reasons could cause this: E 1.libgpu_collective.so is not installed. E 2.nccl is not installed or found. E 3.mpi is not installed or found

../../../../mindspore/communication/management.py:69: RuntimeError -------------------------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------------------------- [ERROR] ME(102,python):2020-04-19-14:19:19.404.520 [mindspore/ccsrc/device/gpu/distribution/collective_init.cc:35] InitCollective] Loading libgpu_collective.so failed. Many reasons could cause this: 1.libgpu_collective.so is not installed. 2.nccl is not installed or found. 3.mpi is not installed or found ==================================================================================== short test summary info ===================================================================================== FAILED test_matmul_tensor.py::test_two_matmul - RuntimeError: mindspore/ccsrc/device/gpu/distribution/collective_init.cc:35 InitCollective] Loading libgpu_collective.so failed. Many reasons c... ======================================================================================= 1 failed in 0.96s ========================================================================================

leonwanghui commented 4 years ago

@FudanEMWLab Hi, firstly thanks for having a test on MindSpore! Before answering your quesiton, please notice that it's not suggested to directly test MindSpore examples with GPU (especially with nccl) on devel environment. Could you transfer the whl package to mindspore/mindspore-gpu:runtime docker image and retry your code? If the error still goes on, then we will check what's going on.

leonwanghui commented 4 years ago

If you want to try some test code about multi-GPU scenario, please try https://github.com/mindspore-ai/mindspore/tree/master/tests/st/nccl.

mpirun -n 8 pytest -s test_nccl_reduce_scatter_op.py

nizhaoqiao commented 4 years ago

@FudanEMWLab PaddlePaddle Supports Multi-GPU Training pretty well, you can reference https://github.com/PaddlePaddle/Fleet/ for more details?

hannibalhuang commented 4 years ago

@FudanEMWLab PaddlePaddle Supports Multi-GPU Training pretty well, you can reference https://github.com/PaddlePaddle/Fleet/ for more details?

Hi @nizhaoqiao , even though your github account is weirdly empty but I guess you are a developer participated in the Paddle community, so you are more than welcomed to join the convo here in MindSpore ! Open source is all about comradery and friendship :)

I'm interested in what you referred to pretty well as in

PaddlePaddle Supports Multi-GPU Training pretty well.

Paddle Bench Numbers

I've looked up some of the benchmark I could find , for example fleet's number, looks like 2000 for 8 card which for the sake of argument, in a linear scaling assumption, be around 256 for single GPU performance.
Another benchmark suggested that for paddle 1.5 it is around 168 for single GPU perf and around 840 for 8 cards in single process (the comparison is made with PyTorch v1.1.0 which is not a very new version).

MindSpore Bench Numbers

There is an article written by a developer independently running benchmarks on MindSpore and PyTorch 1.5 with Ascend 910 and 2080TI/Tesla respectively. It shows that MindSpore on single GPU, without any targeted optimization, reaches around 230 for single GPU. It would be great if you or other developers could run a multi-GPU bench and I would guess the number should be pretty well.

On "wellness"

I think for MindSpore, a newly open source framework that could be on par with Paddlepaddle, a great 4 yrs old open source framework on the type of hardware which is not the primal focal point of MindSpore support, probably we could agree that:

MindSpore does really well

Just some thoughts :) Welcome to participate our community more often :)

nizhaoqiao commented 4 years ago

@hannibalhuang Hahaha, Don't be so nervous bro, I mean no malice~ I do agree MindSpore does really well on Ascend 910! Let's work hard to build more competitive solutions for the community and developers~ :)

hannibalhuang commented 4 years ago

@hannibalhuang Hahaha, Don't be so nervous bro, I mean no malice~ I do agree MindSpore does really well on Ascend 910! Let's work hard to build more competitive solutions for the community and developers~ :)

Cannot agree more with the last sentence :) Just a quick response that I don't know where you picked nervous from my reply which is just a standard open source community exchange, and I didn't imply in any way that you acted with somewhat "malevolent" attitude . Malice is too strong a word for open source discussions :)

Anyways welcome to provide your own benchmarks for running MindSpore on multi-GPUs as I suggested, I think it'll run pretty well :P

nizhaoqiao commented 4 years ago

@hannibalhuang Hahaha, Don't be so nervous bro, I mean no malice~ I do agree MindSpore does really well on Ascend 910! Let's work hard to build more competitive solutions for the community and developers~ :)

Cannot agree more with the last sentence :) Just a quick response that I don't know where you picked nervous from my reply which is just a standard open source community exchange, and I didn't imply in any way that you acted with somewhat "malevolent" attitude . Malice is too strong a word for open source discussions :)

Anyways welcome to provide your own benchmarks for running MindSpore on multi-GPUs as I suggested, I think it'll run pretty well :P

👍 加油啦～

mindspore-ai / mindspore