openshift / assisted-service

Apache License 2.0
109 stars 209 forks source link

OCPBUGS-27238: Use both the OCP cluster trusted certs and user certs #6649

Closed carbonin closed 1 month ago

carbonin commented 1 month ago

Previously when a user provided mirror registry certs the assisted-service pod would be deployed in such a way that those would be the only certs trusted by most commands running on the pod.

This would cause issues when, for example, the spoke cluster release image is mirrored internally, but the hub cluster image is not.

This was the case in https://issues.redhat.com/browse/OCPBUGS-27238 where assisted-service failed to pull the hub cluster release image because it didn't trust a certificate it otherwise should have.

To address this the infrastructure-operator creates a configmap which is annotated such that the cluster network operator will inject the public CA bundle into it as described in [1]. This content is then merged with the user-provided content (if any is provided) into a third configmap which is mounted into the assisted-service container.

[1] https://docs.openshift.com/container-platform/4.16/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

List all the issues related to this PR

https://issues.redhat.com/browse/OCPBUGS-27238

What environments does this code impact?

How was this code tested?

Tested manually in a dev-scripts environment to see that the cert configmaps were created correctly. Relying on the CI disconnected job to test that case.

Checklist

Reviewers Checklist

carbonin commented 1 month ago

/test ?

openshift-ci[bot] commented 1 month ago

@carbonin: The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/openshift/assisted-service/pull/6649#issuecomment-2269810679): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
carbonin commented 1 month ago

/test edge-e2e-ai-operator-ztp-disconnected

carbonin commented 1 month ago

@omertuc pinged you as this will likely also resolve https://issues.redhat.com/browse/ACM-12866 unless I'm misunderstanding the issue.

openshift-ci-robot commented 1 month ago

@carbonin: This pull request references Jira Issue OCPBUGS-27238, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/assisted-service/pull/6649): >Previously when a user provided mirror registry certs the assisted-service pod would be deployed in such a way that those would be the _only_ certs trusted by most commands running on the pod. > >This would cause issues when, for example, the spoke cluster release image is mirrored internally, but the hub cluster image is not. > >This was the case in https://issues.redhat.com/browse/OCPBUGS-27238 where assisted-service failed to pull the hub cluster release image because it didn't trust a certificate it otherwise should have. > >To address this the infrastructure-operator creates a configmap which is annotated such that the cluster network operator will inject the public CA bundle into it as described in [1]. This content is then merged with the user-provided content (if any is provided) into a third configmap which is mounted into the assisted-service container. > >[1] https://docs.openshift.com/container-platform/4.16/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki > >## List all the issues related to this PR > >https://issues.redhat.com/browse/OCPBUGS-27238 > >- [ ] New Feature >- [ ] Enhancement >- [x] Bug fix >- [ ] Tests >- [ ] Documentation >- [ ] CI/CD > >## What environments does this code impact? > >- [ ] Automation (CI, tools, etc) >- [ ] Cloud >- [x] Operator Managed Deployments >- [ ] None > >## How was this code tested? > >Tested manually in a dev-scripts environment to see that the cert configmaps were created correctly. >Relying on the CI disconnected job to test that case. > > > >- [ ] assisted-test-infra environment >- [x] dev-scripts environment >- [ ] Reviewer's test appreciated >- [x] Waiting for CI to do a full test run >- [ ] Manual (Elaborate on how it was tested) >- [ ] No tests needed > >## Checklist > >- [x] Title and description added to both, commit and PR. >- [x] Relevant issues have been associated (see [CONTRIBUTING] guide) >- [x] This change does not require a documentation update (docstring, `docs`, README, etc) >- [x] Does this change include unit-tests (note that code changes require unit-tests) > >## Reviewers Checklist > >- Are the title and description (in both PR and commit) meaningful and clear? >- Is there a bug required (and linked) for this change? >- Should this PR be backported? > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fassisted-service). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
carbonin commented 1 month ago

/jira refresh

openshift-ci-robot commented 1 month ago

@carbonin: This pull request references Jira Issue OCPBUGS-27238, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to [this](https://github.com/openshift/assisted-service/pull/6649#issuecomment-2269817678): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fassisted-service). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
carbonin commented 1 month ago

/jira refresh

openshift-ci-robot commented 1 month ago

@carbonin: This pull request references Jira Issue OCPBUGS-27238, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
In response to [this](https://github.com/openshift/assisted-service/pull/6649#issuecomment-2269818602): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fassisted-service). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 76.19048% with 15 lines in your changes missing coverage. Please review.

Project coverage is 68.69%. Comparing base (a969ac4) to head (a9efb9b). Report is 8 commits behind head on master.

Files Patch % Lines
...oller/controllers/agentserviceconfig_controller.go 76.19% 9 Missing and 6 partials :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/openshift/assisted-service/pull/6649/graphs/tree.svg?width=650&height=150&src=pr&token=YOR4NSSOXQ&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openshift)](https://app.codecov.io/gh/openshift/assisted-service/pull/6649?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openshift) ```diff @@ Coverage Diff @@ ## master #6649 +/- ## ========================================== + Coverage 68.55% 68.69% +0.14% ========================================== Files 246 246 Lines 36691 37033 +342 ========================================== + Hits 25152 25440 +288 - Misses 9299 9339 +40 - Partials 2240 2254 +14 ``` | [Files](https://app.codecov.io/gh/openshift/assisted-service/pull/6649?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openshift) | Coverage Δ | | |---|---|---| | [internal/controller/controllers/common.go](https://app.codecov.io/gh/openshift/assisted-service/pull/6649?src=pr&el=tree&filepath=internal%2Fcontroller%2Fcontrollers%2Fcommon.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openshift#diff-aW50ZXJuYWwvY29udHJvbGxlci9jb250cm9sbGVycy9jb21tb24uZ28=) | `79.59% <ø> (ø)` | | | [...oller/controllers/agentserviceconfig\_controller.go](https://app.codecov.io/gh/openshift/assisted-service/pull/6649?src=pr&el=tree&filepath=internal%2Fcontroller%2Fcontrollers%2Fagentserviceconfig_controller.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openshift#diff-aW50ZXJuYWwvY29udHJvbGxlci9jb250cm9sbGVycy9hZ2VudHNlcnZpY2Vjb25maWdfY29udHJvbGxlci5nbw==) | `84.10% <76.19%> (-0.41%)` | :arrow_down: | ... and [9 files with indirect coverage changes](https://app.codecov.io/gh/openshift/assisted-service/pull/6649/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openshift)
omertuc commented 1 month ago

This is great, thank you

carbonin commented 1 month ago

Looks like the ztp job is failing with something cert related :worried:

    message: "The Spec could not be synced due to backend error: failed to get release
      image 'registry.build03.ci.openshift.org/ci-op-741s0081/release@sha256:6d80f695c4db0a6048613a25e7b9b0499efa90b236e0330292d559c68d5d5835'.
      Please ensure the releaseImage field in ClusterImageSet 'openshift-v4.17' is
      valid,  (error: command 'oc adm release info -o template --template '{{.metadata.version}}'
      --insecure=false registry.build03.ci.openshift.org/ci-op-741s0081/release@sha256:6d80f695c4db0a6048613a25e7b9b0499efa90b236e0330292d559c68d5d5835
      --registry-config=/tmp/registry-config3793143290' exited with non-zero exit
      code 1: \nerror: unable to read image registry.build03.ci.openshift.org/ci-op-741s0081/release@sha256:6d80f695c4db0a6048613a25e7b9b0499efa90b236e0330292d559c68d5d5835:
      Get \"https://registry.build03.ci.openshift.org/v2/\": x509: certificate signed
      by unknown authority\n)."
carbonin commented 1 month ago

Looks like the cluster ca bundle didn't get filled into the config map

            "apiVersion": "v1",
            "data": {
                "ca-bundle.crt": ""
            },
            "kind": "ConfigMap",
            "metadata": {
                "creationTimestamp": "2024-08-05T21:39:02Z",
                "name": "assisted-trusted-ca-bundle",
                "namespace": "assisted-installer",
                "ownerReferences": [
                    {
                        "apiVersion": "agent-install.openshift.io/v1beta1",
                        "blockOwnerDeletion": true,
                        "controller": true,
                        "kind": "AgentServiceConfig",
                        "name": "agent",
                        "uid": "97c964bf-0ad5-475b-9bb1-9eb7d29b9da7"
                    }
                ],
                "resourceVersion": "39147",
                "uid": "a3890e40-b9c7-49d2-b2a5-b75d369bed8b"
            }
        },
        {
            "apiVersion": "v1",
            "kind": "ConfigMap",
            "metadata": {
                "annotations": {
                    "config.openshift.io/inject-trusted-cabundle": "true"
                },
                "creationTimestamp": "2024-08-05T21:39:02Z",
                "name": "cluster-trusted-ca-bundle",
                "namespace": "assisted-installer",
                "ownerReferences": [
                    {
                        "apiVersion": "agent-install.openshift.io/v1beta1",
                        "blockOwnerDeletion": true,
                        "controller": true,
                        "kind": "AgentServiceConfig",
                        "name": "agent",
                        "uid": "97c964bf-0ad5-475b-9bb1-9eb7d29b9da7"
                    }
                ],
                "resourceVersion": "39144",
                "uid": "02697385-7588-4f74-ad5c-95bdae743980"
            }
        },
carbonin commented 1 month ago

:facepalm: It's a label, not an annotation

carbonin commented 1 month ago

/test edge-e2e-ai-operator-ztp-disconnected

carbonin commented 1 month ago

/retest

carbonin commented 1 month ago

Can someone take a look at this one?

Maybe @omertuc or @CrystalChun ?

carbonin commented 1 month ago

maybe we could include some doc

Yeah, you're probably right. I'll find a place for this.

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin, CrystalChun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/assisted-service/blob/master/OWNERS)~~ [CrystalChun,carbonin] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 1 month ago

/retest-required

Remaining retests: 0 against base HEAD 885904784c71d90ab4014c096e3348d08b174849 and 2 for PR HEAD a9efb9b8af3473a6a49acf662ae2a2ba59d98c8a in total

openshift-ci-robot commented 1 month ago

/retest-required

Remaining retests: 0 against base HEAD 84f998ead4061df0d401c72601878a3b89b33e71 and 1 for PR HEAD a9efb9b8af3473a6a49acf662ae2a2ba59d98c8a in total

openshift-ci[bot] commented 1 month ago

@carbonin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-ai-operator-disconnected-capi a9efb9b8af3473a6a49acf662ae2a2ba59d98c8a link false /test edge-e2e-ai-operator-disconnected-capi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 1 month ago

@carbonin: Jira Issue OCPBUGS-27238: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-27238 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/assisted-service/pull/6649): >Previously when a user provided mirror registry certs the assisted-service pod would be deployed in such a way that those would be the _only_ certs trusted by most commands running on the pod. > >This would cause issues when, for example, the spoke cluster release image is mirrored internally, but the hub cluster image is not. > >This was the case in https://issues.redhat.com/browse/OCPBUGS-27238 where assisted-service failed to pull the hub cluster release image because it didn't trust a certificate it otherwise should have. > >To address this the infrastructure-operator creates a configmap which is annotated such that the cluster network operator will inject the public CA bundle into it as described in [1]. This content is then merged with the user-provided content (if any is provided) into a third configmap which is mounted into the assisted-service container. > >[1] https://docs.openshift.com/container-platform/4.16/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki > >## List all the issues related to this PR > >https://issues.redhat.com/browse/OCPBUGS-27238 > >- [ ] New Feature >- [ ] Enhancement >- [x] Bug fix >- [ ] Tests >- [ ] Documentation >- [ ] CI/CD > >## What environments does this code impact? > >- [ ] Automation (CI, tools, etc) >- [ ] Cloud >- [x] Operator Managed Deployments >- [ ] None > >## How was this code tested? > >Tested manually in a dev-scripts environment to see that the cert configmaps were created correctly. >Relying on the CI disconnected job to test that case. > > > >- [ ] assisted-test-infra environment >- [x] dev-scripts environment >- [ ] Reviewer's test appreciated >- [x] Waiting for CI to do a full test run >- [ ] Manual (Elaborate on how it was tested) >- [ ] No tests needed > >## Checklist > >- [x] Title and description added to both, commit and PR. >- [x] Relevant issues have been associated (see [CONTRIBUTING] guide) >- [x] This change does not require a documentation update (docstring, `docs`, README, etc) >- [x] Does this change include unit-tests (note that code changes require unit-tests) > >## Reviewers Checklist > >- Are the title and description (in both PR and commit) meaningful and clear? >- Is there a bug required (and linked) for this change? >- Should this PR be backported? > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fassisted-service). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-bot commented 1 month ago

[ART PR BUILD NOTIFIER]

Distgit: ose-agent-installer-api-server This PR has been included in build ose-agent-installer-api-server-container-v4.17.0-202408091314.p0.gf40fe42.assembly.stream.el9. All builds following this will include this PR.