Closed simonpasquier closed 2 weeks ago
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.
/skip
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.
/cc @bburt-rh /cc @jan--f
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.16.0" version, but no target version was set.
current status:
As mentioned in the sync earlier, I would like to make sure we make it very clear to the user what we mean by cross-namespace
. Iiuc this means user namespace as well as system namespaces. Since this currently talks about cross namespace in the context of UWM, I think we should make it explicit that system namespaces can be queried as well for such rules (if that is actually the case).
I'm sure @bburt-rh would know best how to phrase that.
summarizing offline discussions about the console issue
The problem is that cross-namespace rules won't be displayed in the developer console.
As an example, I've deployed a cross-namespace rule (NamespaceNotEnforcingRestrictedPolicy
) into the user-monitoring-shared
namespace. The rule fires an alert for the ns1
namespace but the alert isn't visible in the dev console:
It is visible in the admin console though:
In terms of user experience, it is less than ideal since a user with only access to the ns1
project can't see the alert being active and they can't silence it if they receive an Alertmanager notification for it.
The reason behind the issue is that the console uses the /api/v1/rules
endpoint exposed by prom-label-proxy which will only return alerting rules with a static namespace="<selected namespace>"
label.
Possible options being discussed:
/api/v1/alerts
endpoint instead. The problem is that we won't have access to the alert definition (including the PromQL expression) which makes it hard for the user to understand the cause and investigate further./hold
Modify prom-label-proxy to return any rule that matches the given namespace or that has an alert matching the given namespace. It looks like the most appropriate solution and something that also makes outside of OCP.
I tested this with https://github.com/openshift/prom-label-proxy/pull/369 and it's almost working. When clicking on the alert link to open the PromQL expression in the metrics dashboard, prom-label-proxy replies with a 400 status code and label matcher value (namespace="user-monitoring-shared") conflicts with injected value (namespace!~"(openshift|kube).*|default")
. This is because prom-label-proxy runs with -error-on-replace
.
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.
/hold cancel
/retest-required
/retest-required
/skip
/assign @machine424
PR tested with cluster-bot, other user-defined namespace could trigger the alert in user-monitoring-shared
namespace.
test case: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-75384
/label qe-approved
@simonpasquier: This pull request references MON-3802 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.
If I set two or even more user-monitoring-shared namespaces, and then create same prometheusrules to each namespaces. This will result in repeated alerts. Should we remind or restrict users in this situation?
config.yaml: 'namespacesWithoutLabelEnforcement: [ns1, ns2]'
If I set two or even more user-monitoring-shared namespaces, and then create same prometheusrules to each namespaces. This will result in repeated alerts. Should we remind or restrict users in this situation?
Thanks for the testing @Tai-RedHat! I don't see a strong reason to prevent this situation (and it would be quite hard to detect). But it would be good to mention in the docs.
I've just realized this is assigned to me, I'll take a look.
From https://github.com/openshift/cluster-monitoring-operator/pull/2307#discussion_r1734441777
(maybe it's "safer" to have RulesWithoutLabelEnforcementAllowed disabled by default.)
My initial intention was to avoid friction in adopting this feature but I'm also ok making it opt-in as it's less surprising for platform admins. We can also keep it opt-in for a few releases and then turn it on by default.
@jan--f WDYT?
/lgtm You'll make many users happy with this.
/retest-required
Remaining retests: 0 against base HEAD e6e76f7d844cc430b8be9ce1a9314d2013faa7b6 and 2 for PR HEAD eaf43fea22ae56557b00bef318de3e52ef4bea8f in total
/retest-required
Remaining retests: 0 against base HEAD e6e76f7d844cc430b8be9ce1a9314d2013faa7b6 and 2 for PR HEAD eaf43fea22ae56557b00bef318de3e52ef4bea8f in total
/lgtm
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: machine424, simonpasquier, slashpai
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/retest-required
Remaining retests: 0 against base HEAD 04dbe83e4b4d6b576aa2e14fdbadbd1de3ea2016 and 2 for PR HEAD 6adb5214ee2a8b4a8e143db121471ac2878b74e9 in total
This change introduces a way to deploy user-defined rules which are not scoped to their namespace of origin.
To enable the feature, a user-defined monitoring admin needs to configure at least one namespace in the UWM ConfigMap:
For all
PrometheusRule
objects defined in theuser-monitoring-shared
namespace, Prometheus and Thanos Ruler evaluate the PromQL expressions without enforcing the namespace label of origin. It makes it possible to have generic rules that get applied to all (or a subset of) the user projects instead of having individual rule objects in each user project.The capability is enabled by default but a cluster admin can decide to disable it in the CMO ConfigMap:
For example, a user-defined admin can create a single rule that fires when a user namespace doesn't enforce the Restricted pod security policy.