Closed tenzen-y closed 1 year ago
/cc @terrytangyuan @alculquicondor @gaocegege @zw0610
Sounds good to me
@tenzen-y @terrytangyuan Wondering what is the estimated release date for this task? Our company depends on mpi-operator v2. I can also help on a few if needed :)
@tenzen-y @alculquicondor Any estimates on those pending issues? Perhaps @ByronHsu could help some of those.
@ByronHsu We have yet to set a release date for 0.4.0. However, progress has been good.
I can also help on a few if needed
Thanks.
Also, we can not work on #505 yet since this issue depends on https://github.com/kubernetes-sigs/kueue/issues/360.
However, I'm open to other tasks not mentioned above!
we can leave #505 to the kueue repo as well
As another option, we might be able to include kueue related enhancements after the 0.4.0 release (0.5.0?)
Sounds good! Thanks for the amazing effort!
It would be better to include #521 in MPI Operator v0.4.0.
Releasing 0.4.0 will help for the Kueue-MPI integration: https://github.com/kubernetes-sigs/kueue/issues/65. With the decision that the integration is happening inside Kueue we need to have a dependency on the mpi-operator. For now, I draft (https://github.com/kubernetes-sigs/kueue/pull/578) the integration using the master of the mpi-operator, so it is not blocking progress, but at some point we need to switch.
cc @alculquicondor @mwielgus
We are pretty much ready for a release.
@terrytangyuan how can we do a release? I remember we had to upload images, but now I think that's not necessary. Although tags might still be necessary. What else do we need?
Ah, this also needs to be updated https://github.com/kubeflow/mpi-operator/blob/master/RELEASE.md
@tenzen-y could you take it?
We are pretty much ready for a release.
@terrytangyuan how can we do a release? I remember we had to upload images, but now I think that's not necessary. Although tags might still be necessary. What else do we need?
@alculquicondor We also need to add e2e for the coscheduling plugins (#500) before releasing v0.4.0. So I will update the change log once implementing e2e is done.
We should release through GitHub Release (in the UI). Yes please update the release notes.
Note that: Probably, we need to create CI pipelines to build example images or manually build those images on our local machine and push the registry before we cut a new release.
Yep those should be automated. Here's a reference GitHub Action that we can borrow, e.g. docker image push and GitHub release. https://github.com/argoproj/argo-workflows/blob/master/.github/workflows/release.yaml
created a issue: #541
Can we manually create the images for this release?
Are we missing anything else for the release?
Can we manually create the images for this release?
I don't have permission to publish images to Dockerhub, although building images on my locally is possible.
Are we missing anything else for the release?
I'm working on fixing the below bug:
Oh, this is a bug...
I will create a separate PR to fix that.
W0403 20:47:56.968863 15661 podgroup.go:314] Ignore replica "Launcher" priority class "non-existence": priorityclass.scheduling.k8s.io "non-existence" not found
podgroup_test.go:624: Unexpected calculatePGMinResources for the scheduler-plugins (-want,+got):
&v1.ResourceList{
- s"cpu": {i: resource.int64Amount{value: 7}, s: "7", Format: "DecimalSI"},
+ s"cpu": {i: resource.int64Amount{value: 12}, Format: "DecimalSI"},
- s"memory": {i: resource.int64Amount{value: 19327352832}, s: "18Gi", Format: "BinarySI"},
+ s"memory": {i: resource.int64Amount{value: 36507222016}, Format: "BinarySI"},
}
https://github.com/kubeflow/mpi-operator/actions/runs/4601155665/jobs/8128664833?pr=540#step:8:208
https://github.com/kubeflow/mpi-operator/pull/540#issuecomment-1496012813
And also, we might need to create CHANGELOG, as you mentioned.
I do have permissions. Once you give me the green light, I could build and upload.
I do have permissions. Once you give me the green light, I could build and upload.
Great!
Note that to support the multi-architectures, we must specify the platforms when we build the operator image:
$ make images PLATFORMS=linux/amd64,linux/arm64,linux/ppc64le
Also need to run with IMG_BUILDER="docker buildx"
. However, the base images need some versioning. I'll work on this tomorrow.
Released v0.4.0 🎉
https://github.com/kubeflow/mpi-operator/releases/tag/v0.4.0
Only https://github.com/kubeflow/website/pull/3453 remains.
All tasks are completed! Thanks to everyone!
/close
@tenzen-y: Closing this issue.
Maybe we want to cut a new mpi-operator release once we have completed the following tasks:
#505(We will work on the kueue side: https://github.com/kubernetes-sigs/kueue/pull/578)