openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.14k stars 424 forks source link

Time-consuming problem at MPI_INIT funtion #6658

Open kihangyoun opened 3 years ago

kihangyoun commented 3 years ago

Hello All,

There is a problem that takes a lot of time during MPI startup(MPI_INIT) with a large number of MPI ranks. The section that takes time is: ucp_worker.c:1719 UCX INFO ep_cfg[0], ep_cfg[2] When I use 76,000 MPI rank, it take 68~72 seconds in MPI startup(MPI_INIT). Here is printed log and bold font is bottleneck(I guess).

*Intel MPI version is 2021.2.0, UCX is 1.10.0 & MOFED 5.2-1.0.4.0. Please let me know if you have any suggestion or need any additional information. Thanks

[0] MPI startup(): libfabric version: 1.11.0-impi libfabric:376032:core:core:ofi_register_provider():427 registering provider: ofi_rxm (111.0) libfabric:376032:core:core:ofi_register_provider():427 registering provider: mlx (1.4) libfabric:376032:core:core:ofi_register_provider():427 registering provider: ofi_hook_noop (111.0) libfabric:376032:core:core:figetinfo():1117 Found provider with the highest priority mlx, must_use_util_prov = 0 [0] MPI startup(): libfabric provider: mlx libfabric:376032:core:core:fifabric():1406 Opened fabric: mlx [0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1 [0] MPI startup(): addrnamelen: 1024 [1618380329.430667] [maru3685:332864:0] ucp_worker.c:1719 UCX INFO ep_cfg[0]: tag(dc_mlx5/mlx5_0:1); rma(dc_mlx5/mlx5_0:1); [1618380329.858702] [maru3658:330396:0] ucp_worker.c:1719 UCX INFO ep_cfg[1]: tag(dc_mlx5/mlx5_0:1 posix/memory cma/memory); rma(dc_mlx5/mlx5_0:1 posix/memory sysv/memory); [1618380329.859058] [maru3658:330352:0] ucp_worker.c:1719 UCX INFO ep_cfg[2]: tag(self/memory cma/memory dc_mlx5/mlx5_0:1); rma(self/memory posix/memory sysv/memory); .... [43743] MPI startup(): selected platform: icx [0] MPI startup(): Load tuning file: "/opt/local/mpi/2021.2.0/etc/tuning_icx_shm-ofi_mlx.dat" [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 276153 0721.maru 0 [0] MPI startup(): 1 276154 0721.maru 1 [0] MPI startup(): 2 276155 ****0721.maru 2

brminich commented 3 years ago

Hi @kihangyoun,

kihangyoun commented 3 years ago

Hi @brminich, Thanks for your attention.

brminich commented 3 years ago

Were you able to get lower start-up times with any other than UCX transport? For now I'd attribute this slowness to MPI_Init implementation and the collectives being used inside it