kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
440 stars 218 forks source link

Question: Is the network traffic of AllReduce(like, ML gradients) encrypted between workers? #645

Closed jsyqrt closed 4 months ago

jsyqrt commented 4 months ago

Background I've being trying to launch a MPI cluster to train a deep learning model with PyTorch's DDP.

Question Is the network traffic of AllReduce encrypted between workers?

alculquicondor commented 4 months ago

if using ssh, yes.

alculquicondor commented 4 months ago

Actually, I'm not sure if all the communication is encrypted. But this is not a question for the operator. This is a question for the MPI implementation (OpenMPI, Intel, MPICH, etc).

/close

google-oss-prow[bot] commented 4 months ago

@alculquicondor: Closing this issue.

In response to [this](https://github.com/kubeflow/mpi-operator/issues/645#issuecomment-2208933256): >Actually, I'm not sure if all the communication is encrypted. But this is not a question for the operator. This is a question for the MPI implementation (OpenMPI, Intel, MPICH, etc). > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
jsyqrt commented 4 months ago

Hi, I'am using the Intel MPI and it seems not support communication encryption.

So I am curious that if it is possible to encrypt all MPI traffic with Istio/service mesh tools in KubeFlow?

alculquicondor commented 4 months ago

I think someone tried some time ago, but SSH didn't work well with it.

rongou commented 4 months ago

SSH is only used for the initial setup, the actual MPI traffic is not encrypted by design: https://stackoverflow.com/questions/6346873/how-do-mpi-implementations-openmpi-mpich-handle-security-authentication

jsyqrt commented 4 months ago

Istio says it will protect all traffic between pods, check this.

Does this apply to KubeFlow and the MPI traffic in it? If so, maybe it's a good idea to use Istio together with KubeFlow

tenzen-y commented 4 months ago

The current mpi-operator specifications do not support istio as you can see here: https://github.com/kubeflow/mpi-operator/issues/480

jsyqrt commented 4 months ago

Hi @tenzen-y, thanks for your information about istio! So is it possible to use a Service Mesh with mpi-operator, to protect mpi ccl traffics?

alculquicondor commented 4 months ago

It might be possible, but nobody has reported doing it so far.