Closed Tomcli closed 3 months ago
/label feature request
/okay to test
Hi @Tomcli, it seems that there are some code style issue in your code, which leads to failure of CI. It is recommended that you can use "precommit" tool to do some code style test before commit, as in here https://docs.rapids.ai/api/cuspatial/stable/developer_guide/contributing_guide/ . Can you check with precommit and then commit again? Thanks. Or if it's a bit troublesome for you, I can also open a PR and commit your codes.
Thank you @linhu-nv for providing the link to the contributing guide. I fixed the license check and verified with my local pre-commit check.
No problem @Tomcli , @BradReesWork could you please kick off the CI again? Thanks
/okay to test
/merge
We have many users running the Kubeflow training operator who are also interested in using Wholegraph. For our MPIJobs users, many of them still use HorovodRun as the startup command. Therefore, we want to add HorovodRun as one of the Wholegraph launch agents so our users can use Wholegraph on top of Kubeflow.
The new function will be similar to the existing MPI launcher agent, where the horovod library is only imported on demand. The horovod.tensorflow library will be used solely for the Horovod initialization command due to the issue with horovod.torch (see https://github.com/horovod/horovod/issues/4009). After the Horovod initialization, the program can continue to run normal PyTorch code within each rank just like the mpi4py.
fixes #201