Closed tenzen-y closed 1 year ago
blocked on #542.
The build/base/intel.Dockerfile
seems to be broken...
Maybe, We must fix the Dockerfile.
Fetched 10.2 kB in 0s (20.4 kB/s) Reading package lists... E: Failed to fetch https://apt.repos.intel.com/oneapi/dists/all/main/binary-amd64/Packages.bz2 File has unexpected size (265446 != 461276). Mirror sync in progress? [IP: 184.87.69.109 443] Hashes of expected file:
- Filesize:461276 [weak]
- SHA512:b57998a876a5016443cc926dcd890a47c0e579b64a87b5fed7566bf03e403a352c1c04bee9493016927f9fa3001d1faffc725b34174daa5f427a02feb86f650f
- SHA256:20d2c9441b5b7b725b3105bb552c5be21c8a4562ba4985ab2794d78f0d5aad23
- SHA1:289a921381b794a7e96f270244f4d2b18ae55d90 [weak]
- MD5Sum:11d742e8223bc078a46da9984296f744 [weak] Release file created at: Mon, 27 Mar 2023 16:38:50 +0000 E: Failed to fetch https://apt.repos.intel.com/oneapi/dists/all/main/binary-all/Packages.bz2
E: Some index files failed to download. They have been ignored, or old ones used instead. The command '/bin/sh -c apt update && apt install -y --no-install-recommends gnupg2 ca-certificates && apt-key add /tmp/key.PUB && rm /tmp/key.PUB && echo "deb https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list && apt remove -y gnupg2 ca-certificates && apt autoremove -y && apt update && apt install -y --no-install-recommends dnsutils intel-oneapi-mpi && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100 make: *** [Makefile:107: test_images] Error 100
https://github.com/kubeflow/mpi-operator/actions/runs/4577714013/jobs/8083448553#step:4:1601
I can build the image on my local. That error seems temporary.
if you force push, it should trigger a rerun
Maybe, we must wait for the error to be fixed:
is this ready for review now?
is this ready for review now?
I'm still working.
@alculquicondor Thanks for your patience. This PR is ready for review. PTAL :)
Oh, this is a bug... I will create a separate PR to fix that.
W0403 20:47:56.968863 15661 podgroup.go:314] Ignore replica "Launcher" priority class "non-existence": priorityclass.scheduling.k8s.io "non-existence" not found
podgroup_test.go:624: Unexpected calculatePGMinResources for the scheduler-plugins (-want,+got):
&v1.ResourceList{
- s"cpu": {i: resource.int64Amount{value: 7}, s: "7", Format: "DecimalSI"},
+ s"cpu": {i: resource.int64Amount{value: 12}, Format: "DecimalSI"},
- s"memory": {i: resource.int64Amount{value: 19327352832}, s: "18Gi", Format: "BinarySI"},
+ s"memory": {i: resource.int64Amount{value: 36507222016}, Format: "BinarySI"},
}
https://github.com/kubeflow/mpi-operator/actions/runs/4601155665/jobs/8128664833?pr=540#step:8:208
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: alculquicondor
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/hold
/hold cancel
@alculquicondor squashed.
@alculquicondor Can you add a lgtm
label to this PR?
/lgtm
Oh, this is a bug...
so the test passes sometimes?
Oh, this is a bug...
so the test passes sometimes?
Yes, our UTs sometimes pass. Please take a look at #543.
I implemented E2E for integrating with scheduler-plugins.
Part-of: #500
NOTE: This test will still fail since I forgot to implement the logic to update PodGroup when mpiJob.spec.runPolicy.schedulingPolicy is updated.So we must implement the logic first in another PR.resolved in: #542