intel / torch-ccl

oneCCL Bindings for Pytorch*
BSD 3-Clause "New" or "Revised" License
86 stars 25 forks source link

CCL_ERROR problem #54

Open zzningxp opened 12 months ago

zzningxp commented 12 months ago
2023:11:28-16:35:45:(35980) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2023:11:28-16:35:45:(35980) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2023:11:28-16:35:45:(35980) |CCL_ERROR| base_thread.cpp:36 start: error while creating worker thread #0 pthread_create returns 22
2023:11:28-16:35:45:(35980) |CCL_ERROR| exec.cpp:134 start_workers: condition workers.back()->start(cpu_affinity, mem_affinity) == ccl::status::success failed
failed to start worker # 0

RuntimeError: oneCCL: exec.cpp:134 start_workers: EXCEPTION: failed to start worker # 0