kubeflow / mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
https://www.kubeflow.org/docs/components/training/mpi/
Apache License 2.0
420 stars 211 forks source link

Specify the platforms for building image #532

Closed tenzen-y closed 1 year ago

tenzen-y commented 1 year ago

I specified the platforms for building images since we can not build all images for all platforms. For example, we can not build build/base/intel-builder.Dockerfile for linux/arm64.

$ docker build --platform linux/arm64 -t mpioperator/intel-builder build/base -f build/base/intel-builder.Dockerfile
[+] Building 13.8s (11/11) FINISHED
...
#11 8.184 Reading package lists...
#11 8.542 Building dependency tree...
#11 8.607 Reading state information...
#11 8.664 E: Unable to locate package intel-oneapi-compiler-dpcpp-cpp
#11 8.664 E: Unable to locate package intel-oneapi-mpi-devel
------
executor failed running [/bin/sh -c apt update     && apt install -y --no-install-recommends gnupg2 ca-certificates     && apt-key add /tmp/key.PUB     && rm /tmp/key.PUB     && echo "deb https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list     && apt remove -y gnupg2 ca-certificates     && apt autoremove -y     && apt update     && apt install -y --no-install-recommends         libstdc++-8-dev binutils procps clang         intel-oneapi-compiler-dpcpp-cpp         intel-oneapi-mpi-devel     && rm -rf /var/lib/apt/lists/*]: exit code: 100
...
tenzen-y commented 1 year ago

This will break the build on Mac?

Yes, I faced the above error on the M1 Mac. However, the error will also happen on the x86 machine with the --platform=linux/arm64 flag in the following:

docker build --platform linux/arm64 -t mpioperator/intel-builder build/base -f build/base/intel-builder.Dockerfile
tenzen-y commented 1 year ago

For the long term, it would be better to support the linux/arm64 in all images.

alculquicondor commented 1 year ago

For example, we can not build build/base/intel-builder.Dockerfile for linux/arm64.

So this PR is breaking the build of the intel-builder image? Or I'm not understanding the PR description.

tenzen-y commented 1 year ago

For example, we can not build build/base/intel-builder.Dockerfile for linux/arm64.

So this PR is breaking the build of the intel-builder image? Or I'm not understanding the PR description.

@alculquicondor I faced the above error from the make test_images command since I'm using M1 Mac (arm64 machine) on my locally. So I create this PR so that we can develop using arm64 machines (e.g., M1 Mac).

Does that make sense?

alculquicondor commented 1 year ago

ah ok, so by default we build on linux/amd64, but you will now have the ability to change that to linux/arm64 for your own development.

I didn't notice the different characters at first glance :)

/lgtm /approve

google-oss-prow[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/mpi-operator/blob/master/OWNERS)~~ [alculquicondor] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment