openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

CL/HIER: fix int overflow in alltoall #944

Closed Sergei-Lebedev closed 2 months ago

Sergei-Lebedev commented 3 months ago

What

fix int overflow in alltoall

lappazos commented 3 months ago

Way to reproduce bug prior to the fix -

liorpa@mtl-e2e-slurm05:/mtrsysgwork/liorpa$ N=90;PW=8;mpirun -np $((N*8)) --host hgx-isr1-037:8,hgx-isr1-038:8,hgx-isr1-039:8,hgx-isr1-040:8,hgx-isr1-041:8,hgx-isr1-042:8,hgx-isr1-043:8,hgx-isr1-044:8,hgx-isr1-045:8,hgx-isr1-046:8,hgx-isr1-047:8,hgx-isr1-048:8,hgx-isr1-049:8,hgx-isr1-050:8,hgx-isr1-051:8,hgx-isr1-052:8,hgx-isr1-053:8,hgx-isr1-054:8,hgx-isr1-055:8,hgx-isr1-056:8,hgx-isr1-057:8,hgx-isr1-058:8,hgx-isr1-059:8,hgx-isr1-060:8,hgx-isr1-061:8,hgx-isr1-062:8,hgx-isr1-063:8,hgx-isr1-064:8,hgx-isr1-065:8,hgx-isr1-066:8,hgx-isr1-067:8,hgx-isr1-069:8,hgx-isr1-070:8,hgx-isr1-071:8,hgx-isr1-072:8,hgx-isr1-073:8,hgx-isr1-074:8,hgx-isr1-075:8,hgx-isr1-076:8,hgx-isr1-077:8,hgx-isr1-078:8,hgx-isr1-079:8,hgx-isr1-080:8,hgx-isr1-081:8,hgx-isr1-082:8,hgx-isr1-083:8,hgx-isr1-084:8,hgx-isr1-085:8,hgx-isr1-086:8,hgx-isr1-087:8,hgx-isr1-088:8,hgx-isr1-089:8,hgx-isr1-090:8,hgx-isr1-091:8,hgx-isr1-092:8,hgx-isr1-093:8,hgx-isr1-094:8,hgx-isr1-095:8,hgx-isr1-096:8,hgx-isr1-097:8,hgx-isr1-098:8,hgx-isr1-099:8,hgx-isr1-100:8,hgx-isr1-101:8,hgx-isr1-102:8,hgx-isr1-103:8,hgx-isr1-104:8,hgx-isr1-105:8,hgx-isr1-106:8,hgx-isr1-107:8,hgx-isr1-108:8,hgx-isr1-109:8,hgx-isr1-110:8,hgx-isr1-112:8,hgx-isr1-113:8,hgx-isr1-114:8,hgx-isr1-115:8,hgx-isr1-116:8,hgx-isr1-117:8,hgx-isr1-118:8,hgx-isr1-119:8,hgx-isr1-120:8,hgx-isr1-121:8,hgx-isr1-122:8,hgx-isr1-123:8,hgx-isr1-124:8,hgx-isr1-125:8,hgx-isr1-126:8,hgx-isr1-127:8,hgx-isr1-128:8 --map-by node --mca coll_hcoll_enable 0 --bind-to socket -x LD_LIBRARY_PATH -x MELLANOX_VISIBLE_DEVICES=0,3,4,5,6,9,10,11 -x CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -x UCX_IB_GID_INDEX=3 -x UCC_CL_HIER_FULL_SBGP_TLS=ucp -x UCC_CL_HIER_NODE_SBGP_TLS=cuda -x UCC_CL_HIER_TUNE=alltoall:@node_split:inf -x UCC_CLS=basic,hier -x UCC_TL_UCP_ALLTOALL_PAIRWISE_NUM_POSTS=$PW -x UCC_TL_UCP_ALLTOALLV_PAIRWISE_NUM_POSTS=$PW -x UCX_RNDV_THRESH=0 -x UCC_TLS=ucp,cuda -x OMPI_MCA_btl=tcp,self -x OMPI_MCA_btl_tcp_if_include=eno8303 -x UCX_NET_DEVICES=mlx5_0:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_9:1,mlx5_10:1,mlx5_11:1 $HPCX_UCC_DIR/bin/ucc_perftest -c alltoall -b 2097152 -e $((1024/N))M -m cuda -F Collective: Alltoall Memory type: cuda Datatype: float32 Reduction: N/A Inplace: 0 Warmup:
small 100 large 20 Iterations:
small 1000 large 200

   Count        Size                Time, us                           Bandwidth, GB/s
                             avg         min         max         avg         max         min
 2097152     8388608   266035.88   262183.51   271185.61       22.67       23.00       22.24
 4194304    16777216   428363.17   423339.08   434625.75       28.16       28.49       27.75
 8388608    33554432   285558.47   281186.49   289877.72       84.49       85.80       83.23
lappazos commented 3 months ago

@Sergei-Lebedev @samnordmann @manjugv can we merge it?