openshift / api

Canonical location of the OpenShift API definition.
http://www.openshift.org
Apache License 2.0
95 stars 510 forks source link

CORS-3594: Setting CAPG as the default infra provider #1958

Closed barbacbd closed 1 month ago

barbacbd commented 2 months ago

** CAPG should be used as the default infra provider for GCP installs.

openshift-ci-robot commented 2 months ago

@barbacbd: This pull request references CORS-3594 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/openshift/api/pull/1958): >** CAPG should be used as the default infra provider for GCP installs. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fapi). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 2 months ago

Hello @barbacbd! Some important instructions when contributing to openshift/api: API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

barbacbd commented 2 months ago

/label platform/google

barbacbd commented 2 months ago

/cc @patrickdillon /cc @r4f4 /cc @bfournie

openshift-ci[bot] commented 2 months ago

@barbacbd: The label(s) platform/google cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/openshift/api/pull/1958#issuecomment-2221208796): >/label platform/google Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
barbacbd commented 2 months ago

/retest-required

deads2k commented 2 months ago

/test verify

now that https://github.com/openshift/api/pull/1909 merged. It might be ready. if not, try again in an hour.

patrickdillon commented 2 months ago

1909 fixed the tests, which are now failing with:

 INSUFFICIENT CI testing for "ClusterAPIInstallGCP".
F0715 17:49:34.051041  169158 root.go:64] Error running codegen: error: "install should succeed: infrastructure" only passed 71%, need at least 95% for "ClusterAPIInstallGCP" on {gcp amd64 ha} 

The figure 71% seems off to me. That is, I don't think the infrastructure provisioning success rate is that low. I'm not sure where the discrepancy is coming from.

I'm reviewing the GCP Tech preview installs here: https://sippy.dptools.openshift.org/sippy-ng/jobs/4.17/runs?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-techpreview%22%7D%5D%7D&pageSize=100&sort=desc&sortField=timestamp

Reviewing these failures, the significant one I see is the credentials request failure which recurs multiple times, including this example: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-techpreview/1806798263932686336

That issue was not related to ClusterAPIInstallGCP and was fixed in: https://issues.redhat.com/browse/OCPBUGS-36294

The only issue I see related to ClusterAPIInstallGCP is

level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed during pre-provisioning: failed to add worker roles: failed to set project IAM policy: googleapi: Error 409: There were concurrent policy changes. Please retry the whole read-modify-write with exponential backoff. The request's ETag '\007\006\033\255\347+\335\210' did not match the current policy's ETag '\007\006\033\255\347>%\332'., aborted
Installer 

from: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-techpreview/1805429311365189632

That is something we'll want to fix and would potentially be fixed by https://issues.redhat.com/browse/CORS-3567

stbenjam commented 2 months ago

/test verify

stbenjam commented 2 months ago

The figure 71% seems off to me. That is, I don't think the infrastructure provisioning success rate is that low. I'm not sure where the discrepancy is coming from.

I'm looking to figure out where the 71% number came from but techpreview gcp infra is low. The default sippy view is "Working" which is flake + success. For this we're using success only.

Sippy is currently saying 89% (There's a toggle in the toolbar to switch between working and passing)

https://sippy.dptools.openshift.org/sippy-ng/tests/4.17/details?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522current_runs%2522%252C%2522operatorValue%2522%253A%2522%253E%253D%2522%252C%2522value%2522%253A%25227%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Afalse%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522Platform%253Agcp%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Afalse%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522FeatureSet%253Atechpreview%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522Topology%253Aha%2522%257D%252C%257B%2522id%2522%253A99%252C%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522infrastructure%2522%257D%252C%257B%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522install%2520should%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&period=default&sort=asc&sortField=current_pass_percentage&view=Passing

stbenjam commented 2 months ago

Non-techpreview GCP definitely has much higher infra success https://sippy.dptools.openshift.org/sippy-ng/tests/4.17/details?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522current_runs%2522%252C%2522operatorValue%2522%253A%2522%253E%253D%2522%252C%2522value%2522%253A%25227%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Afalse%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522Platform%253Agcp%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522FeatureSet%253Atechpreview%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522Topology%253Aha%2522%257D%252C%257B%2522id%2522%253A99%252C%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522infrastructure%2522%257D%252C%257B%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522install%2520should%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522Architecture%253Aamd64%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&period=default&sort=asc&sortField=current_pass_percentage&view=Passing

openshift-ci[bot] commented 2 months ago

@2uasimojo: This PR was included in a payload test run from openshift/installer#8723 trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e013e7f0-4453-11ef-8e9d-1b4fee3fe2e1-0

patrickdillon commented 2 months ago

/test verify

patrickdillon commented 2 months ago

Just reran verify and it looks like our bug fixes are paying off and we're trending in the right direction (86%, up from 71%):

 F0729 19:20:08.510204  169977 root.go:64] Error running codegen: error: "install should succeed: infrastructure" only passed 86%, need at least 95% for "ClusterAPIInstallGCP" on {gcp amd64 ha} 
patrickdillon commented 2 months ago

/test verify

bfournie commented 2 months ago

/lgtm

patrickdillon commented 2 months ago

I ran ~20 GCP techpreview jobs yesterday using gangway. Looking at the infrastructure test links that @stbenjam posted above, I believe we are now seeing a success rate ~98%:

GCP TechPreview Infrastructure

This seems to be actually higher than the non-tech preview tests, which are at around 96-97%:

Non-tech preview

In other words, despite verify test failures, this is looking good to me in regards to CI testing.

patrickdillon commented 1 month ago

/test verify

r4f4 commented 1 month ago

/lgtm

patrickdillon commented 1 month ago

/retest-required /skip

JoelSpeed commented 1 month ago

/lgtm

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: barbacbd, bfournie, JoelSpeed, r4f4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/api/blob/master/OWNERS)~~ [JoelSpeed] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 1 month ago

@barbacbd: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure a912e2ab1441dda7183d1af14b4a0252a118934a link false /test e2e-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-config-api This PR has been included in build ose-cluster-config-api-container-v4.18.0-202408022143.p0.g346347b.assembly.stream.el9. All builds following this will include this PR.