Closed Sergei-Lebedev closed 2 months ago
Way to reproduce bug prior to the fix -
liorpa@mtl-e2e-slurm05:/mtrsysgwork/liorpa$ N=90;PW=8;mpirun -np $((N*8)) --host hgx-isr1-037:8,hgx-isr1-038:8,hgx-isr1-039:8,hgx-isr1-040:8,hgx-isr1-041:8,hgx-isr1-042:8,hgx-isr1-043:8,hgx-isr1-044:8,hgx-isr1-045:8,hgx-isr1-046:8,hgx-isr1-047:8,hgx-isr1-048:8,hgx-isr1-049:8,hgx-isr1-050:8,hgx-isr1-051:8,hgx-isr1-052:8,hgx-isr1-053:8,hgx-isr1-054:8,hgx-isr1-055:8,hgx-isr1-056:8,hgx-isr1-057:8,hgx-isr1-058:8,hgx-isr1-059:8,hgx-isr1-060:8,hgx-isr1-061:8,hgx-isr1-062:8,hgx-isr1-063:8,hgx-isr1-064:8,hgx-isr1-065:8,hgx-isr1-066:8,hgx-isr1-067:8,hgx-isr1-069:8,hgx-isr1-070:8,hgx-isr1-071:8,hgx-isr1-072:8,hgx-isr1-073:8,hgx-isr1-074:8,hgx-isr1-075:8,hgx-isr1-076:8,hgx-isr1-077:8,hgx-isr1-078:8,hgx-isr1-079:8,hgx-isr1-080:8,hgx-isr1-081:8,hgx-isr1-082:8,hgx-isr1-083:8,hgx-isr1-084:8,hgx-isr1-085:8,hgx-isr1-086:8,hgx-isr1-087:8,hgx-isr1-088:8,hgx-isr1-089:8,hgx-isr1-090:8,hgx-isr1-091:8,hgx-isr1-092:8,hgx-isr1-093:8,hgx-isr1-094:8,hgx-isr1-095:8,hgx-isr1-096:8,hgx-isr1-097:8,hgx-isr1-098:8,hgx-isr1-099:8,hgx-isr1-100:8,hgx-isr1-101:8,hgx-isr1-102:8,hgx-isr1-103:8,hgx-isr1-104:8,hgx-isr1-105:8,hgx-isr1-106:8,hgx-isr1-107:8,hgx-isr1-108:8,hgx-isr1-109:8,hgx-isr1-110:8,hgx-isr1-112:8,hgx-isr1-113:8,hgx-isr1-114:8,hgx-isr1-115:8,hgx-isr1-116:8,hgx-isr1-117:8,hgx-isr1-118:8,hgx-isr1-119:8,hgx-isr1-120:8,hgx-isr1-121:8,hgx-isr1-122:8,hgx-isr1-123:8,hgx-isr1-124:8,hgx-isr1-125:8,hgx-isr1-126:8,hgx-isr1-127:8,hgx-isr1-128:8 --map-by node --mca coll_hcoll_enable 0 --bind-to socket -x LD_LIBRARY_PATH -x MELLANOX_VISIBLE_DEVICES=0,3,4,5,6,9,10,11 -x CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -x UCX_IB_GID_INDEX=3 -x UCC_CL_HIER_FULL_SBGP_TLS=ucp -x UCC_CL_HIER_NODE_SBGP_TLS=cuda -x UCC_CL_HIER_TUNE=alltoall:@node_split:inf -x UCC_CLS=basic,hier -x UCC_TL_UCP_ALLTOALL_PAIRWISE_NUM_POSTS=$PW -x UCC_TL_UCP_ALLTOALLV_PAIRWISE_NUM_POSTS=$PW -x UCX_RNDV_THRESH=0 -x UCC_TLS=ucp,cuda -x OMPI_MCA_btl=tcp,self -x OMPI_MCA_btl_tcp_if_include=eno8303 -x UCX_NET_DEVICES=mlx5_0:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_9:1,mlx5_10:1,mlx5_11:1 $HPCX_UCC_DIR/bin/ucc_perftest -c alltoall -b 2097152 -e $((1024/N))M -m cuda -F
Collective: Alltoall
Memory type: cuda
Datatype: float32
Reduction: N/A
Inplace: 0
Warmup:
small 100
large 20
Iterations:
small 1000
large 200
Count Size Time, us Bandwidth, GB/s
avg min max avg max min
2097152 8388608 266035.88 262183.51 271185.61 22.67 23.00 22.24
4194304 16777216 428363.17 423339.08 434625.75 28.16 28.49 27.75
8388608 33554432 285558.47 281186.49 289877.72 84.49 85.80 83.23
@Sergei-Lebedev @samnordmann @manjugv can we merge it?
What
fix int overflow in alltoall