kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.5k stars 1.58k forks source link

[test] `kubeflow-pipeline-e2e-test` and `kubeflow-pipeline-upgrade-test` broken and blocks presubmit #10779

Open chensun opened 2 months ago

chensun commented 2 months ago

This is due to how we build the test images through docker-in-docker, which is broken in the latest available GKE versions.

More context: https://github.com/kubeflow/pipelines/blob/cd16a33e735b30a85b2e736039f72c2ed6d26507/test/deploy-cluster.sh#L90-L97

And 1.25 is no longer available on GKE, causing deployment failure (e.g.):

++ gcloud container clusters create e2e-f243fff-2539 --image-type cos_containerd --release-channel stable --cluster-version 1.25 --num-nodes=2 --machine-type=e2-standard-8 --enable-autoscaling --max-nodes=8 --min-nodes=2 --workload-pool=ml-pipeline-test.svc.id.goog
WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
WARNING: Starting with version 1.18, clusters will have shielded GKE nodes by default.
WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s). 
ERROR: (gcloud.container.clusters.create) ResponseError: code=400, message=No valid versions with the prefix "1.25" found.

Impacted by this bug? Give it a 👍.

github-actions[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.