Context

At 3scale engineering team, we want to use application-monitoring-operator, so both RHMI/3scale will use the same monitoring stack, which will help both teams to follow the same direction, taking into account that 3scale is working on adding metrics, prometheusRules, grafanaDashboards for the next release.

At 3scale SRE/Ops team we are using openshift hive to provision our on demand dev OCP clusters (so engineers can easily do testing with metrics, dashboards...), and we are using hive SyncSet object in order to apply same configurations to different OCP clusters (we define all resources once on a single yaml, and then we can apply the same config to any dev cluster, by just adding new clusters name to the list in the SyncSet object).

We have seen that current documented operator installation involves executing a Makefile target (with grafana/prometheus versions), which executes a bash script that executes oc apply of different files, directories or URLs.

We need an easy way to install the monitoring stack using declarative language (no Makefile target executions), so it will be easy to maintain and keep track of every change for every release on GitHub (GitOps philosophy).

Current workaround

As a workaround, what we are doing now is to parse/extract all resources deployed by scripts/install.sh and adding them to a single SyncSet object (which has a specific spec format). But before creating the SyncSet object, due to openshift hive using k8s native APIs, they don't accept for example some OpenShift apiVersion like authorization.openshift.io/v1 and need to be replaced by k8s native alternative rbac.authorization.k8s.io/v1(see issues https://github.com/openshift/hive/issues/864 and https://issues.redhat.com/browse/CO-532), so we need to fix some resources in order to be full compatible with hive:

We update on some ClusterRole/ClusterRoleBinding resources, apiVersion from OpenShift authorization.openshift.io/v1 to k8s rbac.authorization.k8s.io/v1 (plus applying some additions like adding roleRef.king and roleRef.Group), actually you are already using that k8s native apiVersion on other ClusterRole/ClusterRolebinding objects (but not on all), example:


$ git diff luster-roles/alertmanager-clusterrole_binding.yaml 
diff --git a/deploy/cluster-roles/alertmanager-clusterrole_binding.yaml b/deploy/cluster-roles/alertmanager-clusterrole_binding.yaml
index 502df67..8977427 100644
--- a/deploy/cluster-roles/alertmanager-clusterrole_binding.yaml
+++ b/deploy/cluster-roles/alertmanager-clusterrole_binding.yaml
@@ -1,9 +1,10 @@
-apiVersion: authorization.openshift.io/v1
-groupNames: null
+apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: alertmanager-application-monitoring
roleRef:

apiGroup: rbac.authorization.k8s.io
kind: ClusterRole name: alertmanager-application-monitoring subjects:
- kind: ServiceAccount

We add the namespace name on each non-cluster-scope object (like operator deployment, service_account, role, role_binding, and applicationmonitoring example), actually you are already harcoding the namespace on some other Objects like ClusterRolebinding, example:


$ git diff examples/ApplicationMonitoring.yaml 
diff --git a/deploy/examples/ApplicationMonitoring.yaml b/deploy/examples/ApplicationMonitoring.yaml
index 993b044..0951d45 100644
--- a/deploy/examples/ApplicationMonitoring.yaml
+++ b/deploy/examples/ApplicationMonitoring.yaml
@@ -2,6 +2,7 @@ apiVersion: applicationmonitoring.integreatly.org/v1alpha1
kind: ApplicationMonitoring
metadata:
name: example-applicationmonitoring

namespace: application-monitoring spec: labelSelector: "middleware" additionalScrapeConfigSecretName: "integreatly-additional-scrape-configs"

Finally, we fix an error on namespace name on ClusterRoleBinding grafana-proxy resource:


$ git diff cluster-roles/proxy-clusterrole_binding.yaml
diff --git a/deploy/cluster-roles/proxy-clusterrole_binding.yaml b/deploy/cluster-roles/proxy-clusterrole_binding.yaml
index 26497f6..8547cfc 100644
--- a/deploy/cluster-roles/proxy-clusterrole_binding.yaml
+++ b/deploy/cluster-roles/proxy-clusterrole_binding.yaml
@@ -7,6 +7,6 @@ roleRef:
subjects:
- kind: ServiceAccount
 name: grafana-serviceaccount

namespace: monitoring2
namespace: application-monitoring userNames:
- system:serviceaccount:application-monitoring:grafana-serviceaccount

We have checked on deploy/cluster-roles/README.md that you use Integr8ly installer in order to install application-monitoring-operator (not the Makefile target), and you don't use yamls on deploy/cluster-roles/

In order to have full compatibility with k8s (hence with openshift hive), and not require us to make transformation on almost all objects, we wonder if we can open a PR to fix those small issues, while still being fully compatible with Openshift:

Update some ClusterRole/ClusterRoleBinding apiVersion (and spec) to make it compatible with native k8s (so compatible with openshift hive)
Add namespace to non-cluster-scope objects
Fix namespace typo on ClusterRoleBinding grafana-proxy

Possible improvement

To make easier the installation of application-monitoring-operator using a full declarative language without having to manage all that 25 yamls, we have seen that you are already using olm-catalog, so we wonder if you plan to:

Either publish the operator on a current OperatorSource like certified-operators, redhat-operators or community-operators (so operator can be used by anybody)
Or maybe just provide on the repository an alternative installation method with a working OperatorSource resource that can be easily deployed on Openshift cluster, and then just create a Subscription object to deploy the operator on a given namespace, channel, version...

We have tried to deploy a OperatorSource using data from the Makefile (like registryNamespace: integreatly):

apiVersion: operators.coreos.com/v1
kind: OperatorSource
metadata:
  name: integreatly-operators
  namespace: openshift-marketplace
spec:
  displayName: Integreatly operators
  endpoint: https://quay.io/cnr
  publisher: integreatly
  registryNamespace: integreatly
  type: appregistry

But we have seen that only the integreatly operator is available, so we guess application-monitoring-operator might be private.

Tagging @david-martin to get an initial opinion

Either publish the operator on a current OperatorSource like certified-operators, redhat-operators or community-operators (so operator can be used by anybody)

Or maybe just provide on the repository an alternative installation method with a working OperatorSource resource that can be easily deployed on Openshift cluster, and then just create a Subscription object to deploy the operator on a given namespace, channel, version...

There's a couple of things happening around integreatly and monitoring in OSD4 that means I can't give a clear indication of what I think is best way forward yet. Namely:

possibility of moving the install of AMO to a syncset in Hive so the install of the monitoring stack is more visible (not hidden in integreatly-operator), at a higher level (monitoring becomes first class citizen of RHMI) and more portable.
possibility of using user-workload-monitoring for solving some of the RHMI requirements for monitoring/alerting ini OSD4 in the future

But we have seen that only the integreatly operator is available, so we guess application-monitoring-operator might be private.

I'm not sure what's happening here, or why some element of it would be private. Perhaps @pb82 or @matskiv you may have more insight into the mechanics of OLM and how integreatly pulls in various product operator from quay?

@david-martin integreatly-operator doesn't pull operators from Quay. We have a manifests folder which contains operator packages. These packages are then baked into integreatly-operator image and during installation they are put into a ConfigMap. This ConfigMap is referenced by CatalogSource CR, which makes OLM aware of this package and enables us to install it via Subscription.

@matskiv Do you think you can publish the operator on one of the 3 default OperatorSources, or maybe provide the needed setup to install it using OLM (so not installing it through current Makefile)?

@slopezz AFAIK updating one of the 3 default registries would require some manual work for each realease. But I think we can automate publishing to our own application repo. E.g. TravisCI job could be triggered on new GH release/tag. @david-martin wdyt?

@david-martin what do you think about publishing AMO (at least on integreatly OperatorSource)?

@slopezz At the moment, it's looking likely we'll drop AMO from the integreatly-operator (in OpenShift 4) and put AMO into maintenance mode on the v0 branch for existing Integreatly/RHMI 1.x clusters on OpenShift 3. I'll explain the thinking behind this further below.

As such, it's unlikely we'd take on the publishing of AMO to an OperatorSource. However, you are more than welcome to do this yourself if you like (can add you as a contributor no problem)

The rationale for this is a number of things:

the reconcile logic in AMO is heavily based on 'phases'. This is at odds with how operators should reconcile the operand i.e. able to reconcile back to the expected state no matter what has changed or gone missing. AMO doesn't handle that scenario very well at all. It would take quite a lot of work to give AMO the refactor it needs.
the installation of the prometheus & grafana operators are done via templates from within the AMO code. Again, this is at odds with how we see operators working, in particular in an OLM environment such as OpenShift 4. This pattern was OK for OpenShift 3 where we didn't have OLM. However, using OLM to install all operators has significant benefits from the existing install approach. Changing this logic in AMO is a little more work than we're willing to take on at this time, given other commitments and other points here.
longer term, we intend to move completely away from installing & managing a monitoring stack ourselves. The intent here is to use the User Workload Monitoring feature in OpenShift 4 when it is GA (currently Tech Preview). It's a little early to know exactly how we can do that, but we do know we'll move to it.

So, right now, our intent in the shorter term is to look into solving some of the above problems in a more efficient and Operator/OLM friendly way that meets the needs of the Integreatly on OpenShift 4, and is likely to take the form of changes in the integreatly-operator rather than in AMO.

@david-martin It makes sense, we have started using AMO because we thought it was the way to go with application monitoring following RHMI strategy.

Right now we are using it mainly for dev purpose , and found it very useful, so engineering team can easily start playing with grafana and prometheus in order to build 3scale dashboards and alerts, but we understand your current concerns about it, and definitively we think that all Red Hat products (not only Integration ones) should use the same monitoring stack, which can lead to the standardization of application monitoring.

We have done a quick test of current user workload monitoring state (Tech Preview), we don't think it is ready to be used on production at the moment, but we think that with a few improvements it can be the winner monitoring stack, at least for the prometheus part (not grafana which is out of the scope).

Here below I add our inital user workload monitoring test, so you maybe can take profit from it in case you haven't checked it yet.

At the end of the test we have added a few takeaways that we can discuss afterwards, maybe we can work together with people from Openshift monitoring team in order to provide real feedback about it to improve the product.

User Workload Monitoring Test

We have done a quick test on how user workload monitoring works, in order to check current features and viability for our usage for 3scale product (both on-prem and SaaS). We have used latest OCP 4.4.0-rc.4.

Architecture

Architecture can be check at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/user-workload-monitoring.md :

1.There are two prometheus instances (each ones scraping its own ServiceMonitors...):

Cluster Monitoring: scraping kube-state-metrics, kubelet, etcd...
User Monitoring: scraping application metrics
Both prometheus send alerts to a unique and central AlertManager

Then there is a Thanos instance (with takes data from two previous prometheus time series databases, so has both cluster data (resources, memory/cpu usage...) and application data.
Finally, any prometheus data consumer (like openshift-console metrics tab, grafana, kiali...) should take data from Thanos (the one having all data).

How to setup

Following official docs, basically it is needed to create a new configmap on namespace openshift-monitoring:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    techPreviewUserWorkload:
      enabled: true

And immediately is created on namespace openshift-user-workload-monitoring:

A new prometheus-operator

And also is deployed a prometheus instance called user-workload, creating finally an statefulset prometheus-user-workload with 2 pods.

$ oc get pods -n openshift-user-workload-monitoring
NAME                                 READY     STATUS    RESTARTS   AGE
prometheus-operator-55569f49-6sfnh   1/1       Running   0          5h34m
prometheus-user-workload-0           5/5       Running   1          5h34m
prometheus-user-workload-1           5/5       Running   1          5h34m

Then you just need to deploy any application with metrics on any namespace, and:

Create a ServiceMonitor object to scrape app metrics (not need to add any label)
Create if you want PrometheusRule objects with alerts (not need to add any label)

On our case, we just:

Created prometheus-exporter namespace
Deployed a prometheus-exporter-operator
Deployed a sample redis databse
Deployed a redis prometheus-exporter instance. In seconds a ServiceMonitor is created monitoring that sample redis database

Metrics

Then if you go to the Openshift Console, inside application namespace prometheus-exporter, and go to Monitoring → Metrics tab, you can execute PromQL queries: We have executed two different queries:

One query checking application metrics redis_up (whose data is located on user workload prometheus)
One query checking cluster resources metrics like our pod memory usage (whose data is located on cluster prometheus)

Both queries shows data, because we are really executing queries to Thanos instance (the one with data from both prometheus stacks).

Documentation says that as an administrator, you can go to Prometheus UI (not the embeded prometheus querier inside openshift console), by clicking on:

And execute application PromQL queries there, but it does not work, because cluster prometheus doesn't have application data, application data is on user-workload-monitoring prometheus (or also on Thanos which has everything):

In addition, user workload monitoring prometheus does not include a public Route to check current alerts (which personally I find useful), unlike cluster prometheus:

Alerts

We have created a sample RedisDown PrometheusRule with fake content (redis_up == 1), so we can fire a fake alert because redis is actually up and running , showing value 1.

Then documentation says that alert should appear on the same Openshift Console, inside application namespace prometheus-exporter, and going to Monitoring → Alerting tab.

But there, there is no application active Alert (only 2 active alerts from cluster prometheus) and if we look for our specific sample alert RedisDown on search box (including firing and not firing alerts), there are no alerts found:

So we can see that unlike Metrics tab where queries are done to Thanos (which has data from both prometheus instances), on that case it seems that OpenshiftConsole/Monitoring/Alerting tab shows information only about cluster prometheus alerts.

But, if we go to AlertManage UI by clicking on:

Here we can see both Cluster prometheus (2 alerts firing) and Application alerts (1 fake alert firing):

So it seems that by some reason, AlertManager alerts from user-workload-monitoring, although being there, are hidden from the embedded Alerting tab on Openshift Console.

Grafana

User workload monitoring does not include grafana instance (it is out of the scope), and current cluster grafana is not an operator, is a static deployment with specific volumes mounting specific kubernetes grafana dashboards on configmaps.

So if you want to have application dashboards you need to have your own grafana instance (like integreatly grafana-operator for example, with autodiscovery of dashboards using labels).

Takeaways

It looks that definitively user workload monitoring, although being in Tech Preview (and with what seems a few non working features at the moment), can be the answer to application monitoring standardization accross all Red Hat (no need to manage any extra prometheus instance).
Console Metrics tab works OK (querying Thanos, which includes cluster and application data), but there is no public PrometheusUserWorkloadMonitoring Route in order to be able to check user workload monitoring prometheus console (with configuration, alerts, targets...) which as a SRE, I find it very useful.
Console Alerting tab only shows cluster prometheus alerts (although in fact, AlertManager has alerts from both prometheus, as we could see directly on AlertManager UI).
For application grafana dashboards, there is needed to be used grafana-operator with autodiscovery (with labels), but on that case it needs to have Thanos as datasource (the one with cluster and application data), so there is no need to use a secondary app-prometheus federating with cluster-prometheus).
For OSD, as there is a single AlertManager with both cluster/app alerts, we don't know how would need to be setup route/receiver configs with pagerduty, slack.., taking into account that OSD-SRE team needs to manage cluster alerts, while App-SRE team needs to manage app alerts (which some of them may require cluster resources metrics from application pods like cpu/mem...), maybe playing with specific labels on application alerts...

integr8ly / application-monitoring-operator

Installation of application-monitoring-operator using full declarative language #128