kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
419 stars 210 forks source link

OpenMPI 4.1.5 #588

Open bdevcich opened 10 months ago

bdevcich commented 10 months ago

Any plans to update to the latest stable version of OpenMPI?

alculquicondor commented 10 months ago

It should just work. Feel free to send a PR.

tenzen-y commented 10 months ago

IIRC, we don't specify the OpenMPI version. So just rebuilding the image might be enough.

https://github.com/kubeflow/mpi-operator/blob/6bce22d1ab3a42e7ea52ef573afbc662a49c18a5/build/base/openmpi.Dockerfile#L6

bdevcich commented 10 months ago

It appears that 4.1.0 is the latest that is going to be provided with bullseye: https://packages.debian.org/bullseye/openmpi-bin. I didn't see any updated packages in the updates.

It looks like bookworm has 4.1.4: https://packages.debian.org/bookworm/openmpi-bin

Any concerns with building openmpi from source?

tenzen-y commented 10 months ago

I see.

Any concerns with building openmpi from source?

I want to avoid building the OpenMPI to avoid increasing maintenance costs.

It looks like bookworm has 4.1.4: https://packages.debian.org/bookworm/openmpi-bin

Actually, we already have a PR to update the Debian version, although there are unresolved issues: https://github.com/kubeflow/mpi-operator/pull/573

Can you try to update the Debian version instead of building OpenMPI?

WDYT? @alculquicondor @terrytangyuan

alculquicondor commented 10 months ago

Yes, prefer to update the debian version.

terrytangyuan commented 10 months ago

Agreed. We should not build it from source

bdevcich commented 10 months ago

Thanks.

Makes sense. It adds some complexity. I think 4.1.4 (that comes with bookworm) will be fine as it contains the fix that we're interested in.

So the path forward is to get traction on #573 ?

tenzen-y commented 10 months ago

Thanks.

Makes sense. It adds some complexity. I think 4.1.4 (that comes with bookworm) will be fine as it contains the fix that we're interested in.

So the path forward is to get traction on #573 ?

I think you can open the new PR :)

alculquicondor commented 10 months ago

In any case, you should be able to run any version of OpenMPI with the operator. You can build your containers against bookworm (or any other distro).

abeltre1 commented 8 months ago

@bdevcich @tenzen-y is there a reason why we do not make the install from source?

tenzen-y commented 7 months ago

@bdevcich @tenzen-y is there a reason why we do not make the install from source?

As I mentioned the above (https://github.com/kubeflow/mpi-operator/issues/588#issuecomment-1684121799), we should avoid increasing maintenance costs.