ibm-mas / ansible-devops

Ansible collection supporting devops for IBM Maximo Application Suite
https://ibm-mas.github.io/ansible-devops/
Eclipse Public License 2.0
49 stars 82 forks source link

Grafana role should check for cluster monitoring/prometheus as a dependency #1388

Open stonepd opened 1 month ago

stonepd commented 1 month ago

If the grafana role is run before ocp_cluster_monitoring, the role fails with this error log:

TASK [ibm.mas_devops.grafana : install : Get Prometheus token secret] **********
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (12 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (11 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (10 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (9 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (8 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (7 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (6 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (5 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (4 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (3 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (2 retries left).
FAILED - RETRYING: [localhost]: install : Get Prometheus token secret (1 retries left).
fatal: [localhost]: FAILED! => changed=true 
  attempts: 12
  cmd: |-
    oc get secret -n openshift-user-workload-monitoring | grep prometheus-user-workload-token | awk '{print $1}'
  delta: '0:00:00.283886'
  end: '2024-07-08 15:59:36.636621'
  msg: ''
  rc: 0
  start: '2024-07-08 15:59:36.352735'
  stderr: ''
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

This error should be better handled, either a more informative error message or a dependency check at the start of the role to check that cluster monitoring has been configured before starting the grafana installation.

JonahLuckett commented 2 weeks ago

Prometheus is due to change a lot during OCP 4.16 - @IanBoden to look into the above