openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

TL/SHARP: check comm size in sharp ctx create #990

Closed Sergei-Lebedev closed 6 days ago

Sergei-Lebedev commented 6 days ago

What

Don't create sharp ctx if world size is less than 2

Why ?

fixes

==== backtrace (tid:   1051) ====
0 0x0000000000042520 __sigaction()  ???:0
1 0x0000000000025562 ucc_sbgp_create()  /build-result/src/hpcx-v2.19-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ucc-0b4a0780918900fa497b1e6a65485247fecec4a2/src/components/topo/ucc_sbgp.c:599
2 0x0000000000024c95 ucc_topo_get_sbgp()  /build-result/src/hpcx-v2.19-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ucc-0b4a0780918900fa497b1e6a65485247fecec4a2/src/components/topo/ucc_topo.c:224
3 0x0000000000004ce2 ucc_tl_sharp_context_init()  /build-result/src/hpcx-v2.19-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ucc-0b4a0780918900fa497b1e6a65485247fecec4a2/src/components/tl/sharp/tl_sharp_context.c:294
4 0x0000000000005158 ucc_tl_sharp_context_create_epilog()  /build-result/src/hpcx-v2.19-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ucc-0b4a0780918900fa497b1e6a65485247fecec4a2/src/components/tl/sharp/tl_sharp_context.c:443
5 0x000000000000d597 ucc_context_create_proc_info()  /build-result/src/hpcx-v2.19-gcc-mlnx_ofed-redhat7-cuda12-x86_64/ucc-0b4a0780918900fa497b1e6a65485247fecec4a2/src/core/ucc_context.c:808
6 0x0000000000098bd2 ucc::context_wrapper::context_wrapper()  ???:0