redhat-cop / monitoring

Assets to manage monitoring infrastructure and applications
8 stars 12 forks source link

Add playbook for adding operated prometheus, fix error handling in configure-grafana-datasource #20

Closed tkummer33 closed 4 years ago

tkummer33 commented 4 years ago

What does this PR do?

Adds error handling to configure-grafana-datasource playbook. Now the task does fail only when the return code is not 200 and message doesn't contain "already exists". This ensures that the role/playbooks will continue even when the datasource already exists in grafana

  register: result
  failed_when:
    - '"already exists" not in result.json.message'
    - result.status != 200

The PR also adds OCP operated prometheus as a scrape_target in the prometheus.yml template. The template iterates over "{{ datasources.datasource_url }}". Removes https:// prefix if needed and configures the prometheus-k8s url as a scraping target. The scraping is done through /metrics endpoint. Prometheus will trigger alert if the UP metric returns 0 (either the target is removed or prometheus-k8s stops responding)

datasources:
- name: "openshift-1"
  datasource_url: "https://prometheus-k8s-monitoring.apps.openshift-1.example.com"
  bearer_token: "my_secret_token"

Adds add-ocp-cluster.yml playbook which adds OCP cluster to monitoring. Sets up grafana datasource configures operated prometheus as a target and optionally can configure ssl exporter target. Example inventory below:

datasources:
- name: "openshift-1"
  datasource_url: "https://prometheus-k8s-monitoring.apps.openshift-1.example.com"
  bearer_token: "my_secret_token"

ssl_certs:
  - prometheus-k8s-monitoring.apps.openshift-1.example.com:443
  - api.openshift-1.example.com:6443

Change the network_mode: host on prometheus container

Make the alertmanager_port configurable defaults to 9093

How should this be tested?

Run the add-ocp-cluster.yml playbook with correct "{{ datasources }}" list in the inventory

Is there a relevant Issue open for this?

Provide a link to any open issues that describe the problem you are solving. resolves #

Other Relevant info, PRs, etc.

Please provide link to other PRs that may be related (blocking, resolves, etc. etc.)

People to notify

cc: @redhat-cop/monitoring