Hi team I have a example based on the latest nv image nvcr.io/nvidia/tensorflow:24.07-tf2-py3 but run the mpi job on different nodes. However it complains that the launcher could not identify the worker. Is it supported to have launcher and worker running on separate nodes?
Hi team I have a example based on the latest nv image nvcr.io/nvidia/tensorflow:24.07-tf2-py3 but run the mpi job on different nodes. However it complains that the launcher could not identify the worker. Is it supported to have launcher and worker running on separate nodes?
Also I am curious on where is the code pointer to start the worker. Thanks!