Closed rimolive closed 2 months ago
@rimolive can you remove @DnPlas from Charmed Kubeflow and replace her with myself? ty!
to your questions, for Charmed Kubeflow:
@rimolive deployKF will participate in 1.9, but it's not 100% clear exactly what that will look like.
Separately, given "Kubeflow on AWS" did not participate in 1.8, and announced they were no longer supporting their distribution in https://github.com/awslabs/kubeflow-manifests/issues/794, I think its unlikely they will do 1.9?
Given this, I proposed moving them to "legacy" on the Kubeflow website on this PR https://github.com/kubeflow/website/pull/3641.
However, I also want to avoid confusion with users, because they might think that Kubeflow no longer supports AWS due to the "Kubeflow on AWS" name. So I also think we should merge https://github.com/kubeflow/website/pull/3643 at the same time, which tells users that "Kubeflow on XXXX" is just a name, and NOT the ONLY way to use Kubeflow on that platform.
For IBM IKS:
Are you planning on having your distro ready in sync with the KF 1.9 release?
Yes
Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
Yes
For VMware Distro:
Are you planning on having your distro ready in sync with the KF 1.9 release?
Yes
Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
Yes
For QBO Distro:
Are you planning on having your distro ready in sync with the KF 1.9 release?
Yes
Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
Yes
For VMware Distro:
Are you planning on having your distro ready in sync with the KF 1.9 release?
Yes
Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?
Yes
Calling all Distribution owners! I'm proud to announce our first Release Candidate for Kubeflow 1.9!
You can find the release details in the following URL:
https://github.com/kubeflow/manifests/releases/tag/v1.9.0-rc.0
We'll be working on another Release Candidate when we have Notebooks and KServe Models Webapp updated and ready for KF 1.9. We can use this issue to keep track of blocker issues for distributions while we work on fixing them.
cc @ca-scribner @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard
We also have to update cert-manager, knative, istio, seldon, bentoml etc which will come in later RCs.
@ca-scribner @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard Can you please acknowledge that you are aware of Kubeflow 1.9 RC0 and are aware the the distributions testing phase has started? Please react with a thumbs up if everything is okay from your side and you are proceeding with testing.
deployKF is mostly waiting on the updates from Notebooks (https://github.com/kubeflow/kubeflow/issues/7453), but I am aware that a 1.9.0-RC0 was cut with other components.
What do we mean by '(around 1.28)' here: https://github.com/kubeflow/manifests/tree/v1.9.0-rc.0?tab=readme-ov-file#prerequisites
Is that v1.28.0 and v1.27.11?
I'm proceeding with the testing in QBO.
OK: Everything is looking good in QBO. Tested by doing a vector addition test.
Details:
git branch
* (HEAD detached at v1.9.0-rc.0)
In Kubernetes v1.28.0:
qbo get nodes kubeflow_v1_9_0_nvidia | jq .nodes[]?.image
"kindest/node:v1.28.0"
"kindest/node:v1.28.0"
"kindest/node:v1.28.0"
with NVIDIA GPU Operator
helm list -n gpu-operator
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
gpu-operator-1715634796 gpu-operator 1 2024-05-13 21:13:18.636880948 +0000 UTC deployed gpu-operator-v24.3.0 v24.3.0
And Kustomize
./kustomize version
v5.4.1
It looks like
platform-agnostic-multi-user-pns
is not longer available./kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user-pns | kubectl apply -f -
as per https://github.com/kubeflow/pipelines/issues/5285
So I used the following instead. I'll update the QBOT installer for this version
./kustomize build apps/pipeline/upstream/env/platform-agnostic-multi-user | kubectl apply -f -
This is what it was deployed
kubectl get pods --all-namespaces -o jsonpath="{..image}" | sed 's/ /\n/g' | sort | uniq
docker.io/istio/pilot:1.17.5
docker.io/istio/proxyv2:1.17.5
docker.io/kindest/kindnetd:v20220726-ed811e41
docker.io/kindest/local-path-provisioner:v0.0.22-kind.0
docker.io/kserve/kserve-controller:v0.12.1
docker.io/kserve/models-web-app:v0.10.0
docker.io/kubeflow/training-operator:v1-f8f7363
docker.io/kubeflowkatib/katib-controller:v0.17.0-rc.0
docker.io/kubeflowkatib/katib-db-manager:v0.17.0-rc.0
docker.io/kubeflowkatib/katib-ui:v0.17.0-rc.0
docker.io/kubeflownotebookswg/centraldashboard:v1.8.0
docker.io/kubeflownotebookswg/jupyter-scipy:v1.8.0
docker.io/kubeflownotebookswg/jupyter-web-app:v1.8.0
docker.io/kubeflownotebookswg/kfam:v1.8.0
docker.io/kubeflownotebookswg/notebook-controller:v1.8.0
docker.io/kubeflownotebookswg/poddefaults-webhook:v1.8.0
docker.io/kubeflownotebookswg/profile-controller:v1.8.0
docker.io/kubeflownotebookswg/pvcviewer-controller:v1.8.0
docker.io/kubeflownotebookswg/tensorboard-controller:v1.8.0
docker.io/kubeflownotebookswg/tensorboards-web-app:v1.8.0
docker.io/kubeflownotebookswg/volumes-web-app:v1.8.0
docker.io/library/mysql:8.0.29
docker.io/library/python:3.7
docker.io/metacontrollerio/metacontroller:v2.0.4
gcr.io/knative-releases/knative.dev/eventing/cmd/controller@sha256:92967bab4ad8f7d55ce3a77ba8868f3f2ce173c010958c28b9a690964ad6ee9b
gcr.io/knative-releases/knative.dev/eventing/cmd/webhook@sha256:ebf93652f0254ac56600bedf4a7d81611b3e1e7f6526c6998da5dd24cdc67ee1
gcr.io/knative-releases/knative.dev/net-istio/cmd/controller@sha256:421aa67057240fa0c56ebf2c6e5b482a12842005805c46e067129402d1751220
gcr.io/knative-releases/knative.dev/net-istio/cmd/webhook@sha256:bfa1dfea77aff6dfa7959f4822d8e61c4f7933053874cd3f27352323e6ecd985
gcr.io/knative-releases/knative.dev/serving/cmd/activator@sha256:c2994c2b6c2c7f38ad1b85c71789bf1753cc8979926423c83231e62258837cb9
gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler@sha256:8319aa662b4912e8175018bd7cc90c63838562a27515197b803bdcd5634c7007
gcr.io/knative-releases/knative.dev/serving/cmd/controller@sha256:98a2cc7fd62ee95e137116504e7166c32c65efef42c3d1454630780410abf943
gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping-webhook@sha256:7368aaddf2be8d8784dc7195f5bc272ecfe49d429697f48de0ddc44f278167aa
gcr.io/knative-releases/knative.dev/serving/cmd/domain-mapping@sha256:f66c41ad7a73f5d4f4bdfec4294d5459c477f09f3ce52934d1a215e32316b59b
gcr.io/knative-releases/knative.dev/serving/cmd/webhook@sha256:4305209ce498caf783f39c8f3e85dfa635ece6947033bf50b0b627983fd65953
gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1
gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
gcr.io/ml-pipeline/api-server:2.2.0
gcr.io/ml-pipeline/cache-deployer:2.2.0
gcr.io/ml-pipeline/cache-server:2.2.0
gcr.io/ml-pipeline/frontend:2.2.0
gcr.io/ml-pipeline/metadata-envoy:2.2.0
gcr.io/ml-pipeline/metadata-writer:2.2.0
gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance
gcr.io/ml-pipeline/mysql:8.0.26
gcr.io/ml-pipeline/persistenceagent:2.2.0
gcr.io/ml-pipeline/scheduledworkflow:2.2.0
gcr.io/ml-pipeline/viewer-crd-controller:2.2.0
gcr.io/ml-pipeline/visualization-server:2.2.0
gcr.io/ml-pipeline/workflow-controller:v3.4.16-license-compliance
gcr.io/tfx-oss-public/ml_metadata_store_server:1.14.0
ghcr.io/dexidp/dex:v2.36.0
kserve/kserve-controller:v0.12.1
kserve/models-web-app:v0.10.0
kubeflow/training-operator:v1-f8f7363
kubeflownotebookswg/jupyter-scipy:v1.8.0
mysql:8.0.29
nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.3.0
nvcr.io/nvidia/gpu-operator:v24.3.0
nvcr.io/nvidia/k8s-device-plugin:v0.15.0-ubi8
nvcr.io/nvidia/k8s/container-toolkit:v1.15.0-ubuntu20.04
nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
python:3.7
quay.io/jetstack/cert-manager-cainjector:v1.12.2
quay.io/jetstack/cert-manager-controller:v1.12.2
quay.io/jetstack/cert-manager-webhook:v1.12.2
quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
registry.k8s.io/coredns/coredns:v1.10.1
registry.k8s.io/etcd:3.5.9-0
registry.k8s.io/kube-apiserver:v1.28.0
registry.k8s.io/kube-controller-manager:v1.28.0
registry.k8s.io/kube-proxy:v1.28.0
registry.k8s.io/kube-scheduler:v1.28.0
registry.k8s.io/nfd/node-feature-discovery:v0.15.4
@alexeadem please check the updated release notes https://github.com/kubeflow/manifests/releases/tag/v1.9.0-rc.0 1.27-1.29 officially Yes, we made emissary the default in 1.7 or 1.8
Hi @rimolive @StefanoFioravanzo, a couple of things:
kubeflow/kubeflow
repository are missing. Is this something coming in another RC? Is this planned?Hi @rimolive @StefanoFioravanzo, a couple of things:
- Could I please ask to replace @ca-scribner with me as the distribution owner?
Done
- We are aware that the distribution testing phase has started, but we have identified that components from the
kubeflow/kubeflow
repository are missing. Is this something coming in another RC? Is this planned?
We decided to move on with rc0 because many components were upgraded, but there's a plan for rc1 with the remainder components. Is there one specific you are expecting to test?
Just an update: We have just released Kubeflow 1.9.0-rc.1, which includes all updates from the Notebooks WG, Istio 1.18.7 (targetting to fully upgrade to 1.22 until the final release), and Model Registry 0.2.1-alpha. We ask all Distributions a help with testing the new release and open issues so we can work with the Working Groups to fix them until the final release.
You can find the Release Notes in the releases page.
cc @ca-scribner @DnPlas @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard
Created an issue to track Nutanix distribution testing - https://github.com/nutanix/kubeflow-manifests/issues/21
We are one week away from the Kubeflow 1.9.0-rc.2 release and we plan to be the last release candidate before final. We really welcome any updates about Distribution testing with bug reports, and anything that the release team should pursuit for rc.2 or final.
cc @ca-scribner @DnPlas @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard
We will start testing in the following two weeks, we'll keep you posted.
Hello Distribution owners! Just wanted to announce Kubeflow 1.9.0-rc.2 release, it's the last one before we go final. Please take a look at the Release Notes here and help us validating the manifests by issuing a /lgtm
comment in this issue.
cc @ca-scribner @DnPlas @yhwang @johnugeorge @nagar-ajay @thesuperzapper @liuqi @xujinheng @alexeadem @alex-treebeard
/lgtm
Tested in QBO: api:cloud-stage-4.3.0.7aba1d45
Kubeflow: 1.9.0-rc.2
Kubernetes: v1.29.4
NVIDIA GPU operator:
helm list -n gpu-operator
adable. This is insecure. Location: /home/alex/.qbo/kubeflow_v1_9_0_nvidia.conf
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
gpu-operator-1719811045 gpu-operator 1 2024-07-01 05:17:27.654568661 +0000 UTC deployed gpu-operator-v24.3.0 v24.3.0
Recording: https://youtu.be/-CrtjPsVbUY
@liuqi @xujinheng Can you please confirm if you are testing Kubeflow 1.9.0-rc.2 manifests and let us know if it looks good?
@yhwang Please let us know if you tested Kubeflow 1.9.0-rc.2 and it looks good.
/lgtm
verified the 1.9.0-rc.2 on IKS using the following settings:
Even better test the 1.9 branch in general because it will contain the final release https://github.com/kubeflow/manifests/commits/v1.9-branch/ and further fixes such as
Yes, we are currently testing Kubeflow 1.9.0-rc2. Once we complete our testing, we will post the results here to keep you informed.
/lgtm - verified workflows mentioned in the tracking issue. https://github.com/nutanix/kubeflow-manifests/issues/21
Yes, we are currently testing Kubeflow 1.9.0-rc2. Once we complete our testing, we will post the results here to keep you informed.
As mentioned above, rc.2 does not contain all fixes.
Failed to pull image "docker.io/kserve/models-web-app": no matching manifest for linux/arm64/v8 in the manifest list entries
On a macbook with M3 CPU using minikube start --cpus 8 --memory 8192 --kubernetes-version=v1.29 --driver=docker
On Wed, Jul 10, 2024 at 6:56 PM Julius von Kohout @.***> wrote:
Yes, we are currently testing Kubeflow 1.9.0-rc2. Once we complete our testing, we will post the results here to keep you informed.
As mentioned above, rc.2 does not contain all fixes.
— Reply to this email directly, view it on GitHub https://github.com/kubeflow/manifests/issues/2611#issuecomment-2220196722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABX2CZFOXFEOWK5UNLQRQ3ZLUHNXAVCNFSM6AAAAABCLB22WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQGE4TMNZSGI . You are receiving this because you commented.Message ID: @.***>
@tiansiyuan This thread is exclusively to track work with the Kubeflow Distribution owners to test 1.9 release. Please open an issue in https://github.com/kserve/models-web-app
This is the current status of the Distribution Testing on July 10th:
Distribution | Representative(s) | State |
---|---|---|
Charmed Kubeflow | @DnPlas | Pending |
IBM IKS | @Tomcli @yhwang |
LGTM |
Nutanix | @johnugeorge @nagar-ajay |
LGTM |
Red Hat OpenShift AI | @rimolive | Pending |
Oracle Cloud Infrastructure | @julioo | Pending |
DeployKF | @thesuperzapper | Pending |
VMWare | @liuqi @xujinheng |
Pending |
QBO | @alexeadem | LGTM |
We need your updates as quick as possible as our release date is July 22nd and in case of any bug reports we can take actions on time.
https://github.com/kserve/models-web-app/issues/88
Done.
On Wed, Jul 10, 2024 at 9:26 PM Ricardo Martinelli de Oliveira < @.***> wrote:
@tiansiyuan https://github.com/tiansiyuan This thread is exclusively to track work with the Kubeflow Distribution owners to test 1.9 release. Please open an issue in https://github.com/kserve/models-web-app
— Reply to this email directly, view it on GitHub https://github.com/kubeflow/manifests/issues/2611#issuecomment-2220508381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABX2CYKY6XWR3YHUWWD4ODZLUZAJAVCNFSM6AAAAABCLB22WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQGUYDQMZYGE . You are receiving this because you were mentioned.Message ID: @.***>
Hello, please retest with the 1.9 branch https://github.com/kubeflow/manifests/commits/v1.9-branch/ given the merge of https://github.com/kubeflow/manifests/pull/2795 Testing RC.2 is not enough.
If no further bugs come up, i will synchronize any last-minute release tags from the other working groups on June 20-21 and do the change log and final release on July 22.
@rimolive if i do not get any more final releases/tags from the other WGs, i probably have to release as is on july 22. You can also decide as release manager that we cut RC.3 and delay the final release.
@juliusvonkohout hold on, we need the remaining WGs to cut their final releases. We cannot release 1.9 with components in RC releases.
@juliusvonkohout hold on, we need the remaining WGs to cut their final releases. We cannot release 1.9 with components in RC releases.
To cite myself from a few messages above: "if i do not get any more final releases/tags from the other WGs, i probably have to release as is on July 22. You can also decide as release manager that we cut RC.3 and delay the final release."
I wont do anything today and when i am on vacation :-D As i said July 20-22 is when I can do the remaining stuff. But you have to decide what we do if the final releases/tags from other WGs are not available on July 22. This could be the case and was the case in the previous releases. If this is the case, the question arises whether you want to release anyway, or cut an RC.3 on July 22 and delay the final release. Just think about it ;-)
This is the current status of the Distribution Testing on July 15th:
Distribution | Representative(s) | State |
---|---|---|
Charmed Kubeflow | @DnPlas | Pending |
IBM IKS | @Tomcli @yhwang |
LGTM |
Nutanix | @johnugeorge @nagar-ajay |
LGTM |
Red Hat OpenShift AI | @rimolive | Pending |
Oracle Cloud Infrastructure | @julioo | Pending |
DeployKF | @thesuperzapper | Pending |
VMWare | @liuqi @xujinheng |
Pending |
QBO | @alexeadem | LGTM |
We had no changes in 5 days, and next week it's the Release date for 1.9. Please send us your updates so we can guarantee all Distributions are good with the release.
@juliusvonkohout @rimolive @StefanoFioravanzo I have cut the final v1.9.0 tag for the kubeflow/kubeflow
repo, feel free to sync the manifests for this tag into kubeflow/manifests
.
Hey @rimolive, here is my latest update:
version: 1.9.0-rc.2 platform:
So far it is looking good, so for that version /lgtm.
Hello,
This is the status for today July 22nd:
Distribution | Representative(s) | State |
---|---|---|
Charmed Kubeflow | @DnPlas | LGTM |
IBM IKS | @Tomcli @yhwang |
LGTM |
Nutanix | @johnugeorge @nagar-ajay |
LGTM |
Red Hat OpenShift AI | @rimolive | LGTM |
Oracle Cloud Infrastructure | @julioo | Pending |
DeployKF | @thesuperzapper | Pending |
VMWare | @liuqi @xujinheng |
Pending |
QBO | @alexeadem | LGTM |
We see the majority of distributions agreed on the state of the release. Thank you so much for everyone involved in the testing. We'll keep receiving feedbacks for cases we can consider work on patch releases for 1.9.
Is someone here encountering this bug/PR ?
https://github.com/kubeflow/manifests/pull/2815 https://github.com/kubeflow/manifests/issues/2812 https://github.com/kubeflow/manifests/issues/2766
It has not been changed in 7 months https://github.com/kubeflow/manifests/commits/master/common/dex/base/config-map.yaml , but some users are complaining
Is someone here encountering this bug/PR ?
2815 #2812 #2766
It has not been changed in 7 months https://github.com/kubeflow/manifests/commits/master/common/dex/base/config-map.yaml , but some users are complaining
not in QBO
This issue will be used to track the progress of and coordinate with distributions along the 1.9 release.
While we hope all distros will manage to be ready when the KF 1.9 release is out, this is sometimes difficult to achieve. In this issue, we want to both keep track of the progress of distributions towards the KF 1.9 release and also know which of the distros will be working on KF 1.9 (testing during the distribution testing cycle) even if they can't meet the KF 1.9 deadline.
Tagging distribution owners identified from previous releases (Any new or missed distro owners, please comment on this issue)
@yhwang
@nagar-ajay
@xujinheng
Please let us know if you'll be participating in the 1.9 release by answering the following questions:
Please note the release timelines are being discussed in kubeflow/manifests#2606.
cc @kubeflow/release-team @jbottum