Closed yanminglai closed 1 month ago
Hi @yanminglai Thanks for this report.
UCC_TLS=mlx5
. It may seem counterintuitive, but the reason is that TL/MLX5 uses TL/UCP for service collectives.mpirun -x UCC_COLL_TRACE=info -x UCC_TL_MLX5_NET_DEVICES=mlx5_0:1 -x UCX_NET_DEVICES=mlx5_1:1 -x UCC_TL_MLX5_TUNE=inf --mca coll_ucc_enable 0 --map-by ppr:2:node -np 4 test/mpi/ucc_test_mpi -c alltoall -t world -d uint8 -O 0 -v -m 1:128
Other remarks:
--mca coll_ucc_enable 0
to your mpirun command. This prevents Open-MPI from initializing a second instance of TL/MLX5 which could preempt the entirety of the device memory. -x UCX_NET_DEVICES=<another_device>
set to another device than the one used for TL/MLX5, or, alternatively, add -x UCX_RC_MLX5_DM_COUNT=0 -x UCX_DC_MLX5_DM_COUNT=0
Hoping it will be useful. Let me know if you have further issues
Thank you very much, it answers all my questions. Gonna go ahead and close the issue.
I am trying to mix with ucp and mlx5: use tl mlx5 for all2all and use tl ucp for all other collective operations.
how I configure ucc:
"${UCC_SRC_DIR}/configure" --with-ucx="${UCX_HOME}" \ --prefix="${UCC_INSTALL_DIR}" --with-mpi \ --with-ibverbs \ --with-rdmacm \ --with-tls=self,shm,ucp,mlx5 \
run command:mpirun -x UCC_CLS=basic -x UCC_CL_BASIC_TLS=ucp,mlx5 -x UCC_TL_UCP_TUNE=alltoall:0 -x UCC_TL_MLX5_NET_DEVICES=mlx5_2:1 -np 4 ./ucc_test_mpi -c alltoall -o min
Then I also test use tl mlx5 only:
mpirun -x UCC_TLS=mlx5 -x UCC_TL_MLX5_NET_DEVICES=mlx5_2:1 -np 2 ./ucc_test_mpi -c alltoall
Also met the ctx create problem
here is my ib_dev and bw test
Two Questions: