Closed PeterYurkovich closed 3 days ago
@PeterYurkovich: This pull request references OU-450 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.
Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all
/test all
/retest
/test go-fmt
/retest
/test jsonnet-fmt
/test all
/hold
Adding hold to ensure that the test image commit is removed before merging.
/retest
/unhold
faild to launch with cluster-bot: logs logs2
level=error msg=Cluster operator monitoring Available is False with UpdatingConsolePluginComponentsFailed: UpdatingConsolePluginComponents: reconciling Console Plugin Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/monitoring-plugin: context deadline exceeded: got 2 unavailable replicas
re-test with cluster-bot, monitoring-plugin works well with no other regression issues. /label qe-approved
@PeterYurkovich: This pull request references OU-450 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.
BTW, can I get a brief explanation of why we deleted the monitor-plugin configmap?
BTW, can I get a brief explanation of why we deleted the monitor-plugin configmap?
We are migrating the backend to Go, so the config map holding the old nginx configuration is not needed anymore.
/retest
Thanks for this.
Great to see that we’re working on unifying the plugin backends. I had raised this with the console folks https://redhat-internal.slack.com/archives/C6A3NV5J9/p1700750927381839 but never pushed it. I assume the next step would be to unify the Go backends now (https://github.com/openshift/monitoring-plugin/tree/main/pkg and https://github.com/openshift/logging-view-plugin/tree/main/pkg/server etc.) into a common repository?
I will run some of the IPv6 and dual-stack payloads here as it took us some time and tweaks to make nginx work there:
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-dualstack periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6
Also, looking at some of the backend logs, I can see:
time="2024-08-22T21:20:59Z" level=info msg="enabled features: []\n" module=main
time="2024-08-22T21:20:59Z" level=warning msg="cannot read config file, serving plugin with default configuration, tried /etc/plugin/config.yaml" error="open /etc/plugin/config.yaml: no such file or directory" module=server
time="2024-08-22T21:20:59Z" level=error msg="cannot read base manifest file" error="open opt/app-root/web/dist/plugin-manifest.json: no such file or directory" module=manifest
time="2024-08-22T21:20:59Z" level=info msg="listening on https://:9443" module=server
I don’t know if that’s normal. If it is, could we hide or omit the error?
Note that our e2e tests regarding the plugin aren’t complete. Maybe we’re letting something slip. I don’t think we check that the backend is reachable e.g. (I’ll try to add one in CMO, even though the more realistic test should involve the console being able to reach the backend).
Also, adding a readiness probe to the Deployment could be helpful.
@machine424: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/94996e30-644c-11ef-8921-93ab56f198e1-0
@machine424: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bf4bc240-644c-11ef-91c4-a6d85d676f0b-0
here is the PR to add that "readiness/reachability" check https://github.com/openshift/cluster-monitoring-operator/pull/2444, please tell me if that's sufficient.
here is the PR to add that "readiness/reachability" check https://github.com/openshift/cluster-monitoring-operator/pull/2444, please tell me if that's sufficient.
That readiness check seems sufficient to me. I wouldn't expect or want CMO to check for much other than valid JSON as changes upstream shouldn't impact the downstream distribution tests.
Great to see that we’re working on unifying the plugin backends. I had raised this with the console folks redhat-internal.slack.com/archives/C6A3NV5J9/p1700750927381839 but never pushed it. I assume the next step would be to unify the Go backends now (openshift/monitoring-plugin@main/pkg and openshift/logging-view-plugin@main/pkg/server etc.) into a common repository?
The primary reasons for us making this change are:
1) Standardization across all our plugins (1, 2, 3, 4, 5)
2) Greater control over the server in preparation for changes being made server-side to the package-manifest.json
as well as feature flags to support an Alerting UI in ACM
I think that bringing all the backends together in a single repository could be helpful, but it isn't in our short term goals that I'm aware of.
I don’t know if that’s normal. If it is, could we hide or omit the error?
That error seems out of place, as the go server should be able to locate the plugin-manifest.json
. I'll take a look at it
/retest
/retest
/retest
/retest
Thanks for the context.
I see the server exposes a /health
route
https://github.com/openshift/monitoring-plugin/blob/eb2a1bd822dc7a6f490276121bd654c5b000127d/pkg/server.go#L82 what do you think about adding a readinessProbe to the Deployment based on that like we do here https://github.com/openshift/cluster-monitoring-operator/blob/f7e92e869c43fa0455d656dcfc89045b60e5baa1/assets/admission-webhook/deployment.yaml#L60-L64? (no need to set up certs and stuff as kubelet will ignore that).
That will help us ensure real minimal readiness (the server starts up and can serve a route) of the plugin on all the flavors (ipv4, ipv6, dual stack etc.) that origin
tests run on + https://github.com/openshift/cluster-monitoring-operator/pull/2444 will help us ensure that /plugin-manifest.json
as well is accessible on the flavors that CMO e2e tests run on.
/test all
/retest
The latest commits add a readiness probe as requested and fix the error that was being shown. An invalid default value was fixed upstream (fix) and this PR now also passes in that value.
Good news, thanks again for this.
(I can see the server is logging every call to /health
, I think we can hide those, it's not a blocker for this PR)
I'll re-reun those ipv6 and dual stack payloads just in case:
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-dualstack periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6
But this /lgtm I'll /hold just in case @simonpasquier has anything to add.
@machine424: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bf47de70-6a9e-11ef-911b-0b6f05cec670-0
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: machine424, PeterYurkovich
The full list of commands accepted by this bot can be found here.
The pull request process is described here
going once, going twice... /unhold /retitle OU-450: Migrate Monitoring Plugin Deployment from nginx to Go server
Thanks again @PeterYurkovich for this. (Tell me if I should create a follow up ticket to reduce the verbosity on /health)
I'll create the ticket in our board. Most of the backend code is shared currently, so I'll need to be done across all of our plugins, and also make sure that our other deployments have a similar readiness/liveliness probe.
Thanks for working through this
Edit: Issue here - https://issues.redhat.com/browse/OU-525
/retest-required
Remaining retests: 0 against base HEAD 1c9d5ecfcc0ddf6bfeff9a3a11310d39c6ba30b5 and 2 for PR HEAD 40da92314327f62552580d6d9a6d193efea75596 in total
@PeterYurkovich: all tests passed!
Full PR test history. Your PR dashboard.
[ART PR BUILD NOTIFIER]
Distgit: cluster-monitoring-operator This PR has been included in build cluster-monitoring-operator-container-v4.18.0-202409052113.p0.g886a4be.assembly.stream.el9. All builds following this will include this PR.
This PR looks to update CMO to format the monitoring-plugin kubernetes objects such it will work with the new go backend for monitoring-plugin. This PR should not be merged until its associated PR in the monitoring-plugin repo has been merged.
The primary change in this PR is the removal of the nginx configuration which is no longer needed as the go backend will serve the web server.