kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
419 stars 210 forks source link

Add support for linux/arm64 and linux/ppc64le for MPICH #565

Open sheevy opened 1 year ago

sheevy commented 1 year ago

Currently we support OpenMPI on amd64, arm64 and ppc64le, but Intel MPI and MPICH are only supported on amd64. It would be great if we had feature parity in that regard.

alculquicondor commented 1 year ago

Last time I checked, Intel MPI only worked in amd64 (the libraries cannot be found in other architectures).

tenzen-y commented 1 year ago

If MPICH doesn't support other platforms, we can close this issue.

tenzen-y commented 1 year ago

/retitle Add support for linux/arm64 and linux/ppc64le for MPICH

sheevy commented 1 year ago

If MPICH doesn't support other platforms, we can close this issue.

I think it supports both amd64 and ppc64le, e.g. see: https://koji.mbox.centos.org/koji/buildinfo?buildID=20381

tenzen-y commented 10 months ago

I confirmed the MPICH works on the linux/arm64 platform using ytenzen/mpi-pi:test-mpich.

$ kubectl get nodes --show-labels 
NAME                 STATUS   ROLES           AGE     VERSION   LABELS
kind-control-plane   Ready    control-plane   6m26s   v1.27.1   beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64...

However, I can not confirm if the MPICH works on the linux/ppcle64 platform since I don't have the ppc64le environment.

eero-t commented 9 months ago

However, I can not confirm if the MPICH works on the linux/ppcle64 platform since I don't have the ppc64le environment.

As this is marked for release 0.5.0, maybe ARM is enough for that and PPC support can be split to a separate ticket / to come later ?

tenzen-y commented 9 months ago

However, I can not confirm if the MPICH works on the linux/ppcle64 platform since I don't have the ppc64le environment.

As this is marked for release 0.5.0, maybe ARM is enough for that and PPC support can be split to a separate ticket / to come later ?

I agree. If you can verify the ppc64le, let me know the result.