Adds error handling to configure-grafana-datasource playbook. Now the task does fail only when the return code is not 200 and message doesn't contain "already exists". This ensures that the role/playbooks will continue even when the datasource already exists in grafana
register: result
failed_when:
- '"already exists" not in result.json.message'
- result.status != 200
The PR also adds OCP operated prometheus as a scrape_target in the prometheus.yml template. The template iterates over "{{ datasources.datasource_url }}". Removes https:// prefix if needed and configures the prometheus-k8s url as a scraping target. The scraping is done through /metrics endpoint. Prometheus will trigger alert if the UP metric returns 0 (either the target is removed or prometheus-k8s stops responding)
Adds add-ocp-cluster.yml playbook which adds OCP cluster to monitoring. Sets up grafana datasource configures operated prometheus as a target and optionally can configure ssl exporter target. Example inventory below:
What does this PR do?
Adds error handling to configure-grafana-datasource playbook. Now the task does fail only when the return code is not 200 and message doesn't contain "already exists". This ensures that the role/playbooks will continue even when the datasource already exists in grafana
The PR also adds OCP operated prometheus as a scrape_target in the prometheus.yml template. The template iterates over "{{ datasources.datasource_url }}". Removes https:// prefix if needed and configures the prometheus-k8s url as a scraping target. The scraping is done through /metrics endpoint. Prometheus will trigger alert if the UP metric returns 0 (either the target is removed or prometheus-k8s stops responding)
Adds add-ocp-cluster.yml playbook which adds OCP cluster to monitoring. Sets up grafana datasource configures operated prometheus as a target and optionally can configure ssl exporter target. Example inventory below:
Change the network_mode: host on prometheus container
Make the alertmanager_port configurable defaults to 9093
How should this be tested?
Run the add-ocp-cluster.yml playbook with correct "{{ datasources }}" list in the inventory
Is there a relevant Issue open for this?
Provide a link to any open issues that describe the problem you are solving. resolves #
Other Relevant info, PRs, etc.
Please provide link to other PRs that may be related (blocking, resolves, etc. etc.)
People to notify
cc: @redhat-cop/monitoring