prometheus-community / ansible

Ansible Collection for Prometheus
https://prometheus-community.github.io/ansible/
Apache License 2.0
396 stars 133 forks source link

prometheus_alert_rules_files no longer works as expected since 0.20.0 #454

Open wookietreiber opened 2 weeks ago

wookietreiber commented 2 weeks ago

Just wanted to update the collection to latest, and I recognized this diff.

# config
prometheus_alert_rules_files: >-
  roles/prometheus-eve/files/rules/*.yml

This is output with older version 0.17.0:

TASK [prometheus.prometheus.prometheus : Copy custom alerting rule files] ***
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/memory.yml)
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/slurm.yml)
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/storage.yml)
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/time.yml)
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/spectrum-scale.yml)
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/network.yml)
changed: [head1] => (item=/home/.../roles/prometheus-eve/files/rules/cpu.yml)

This is with current latest 0.22.0:

TASK [prometheus.prometheus.prometheus : Copy custom alerting rule files] ***
changed: [head1] => (item=test.yml)
changed: [head1] => (item=filer.yml)
changed: [head1] => (item=ANSIBLE_RUNLOG.md)
changed: [head1] => (item=CONTRIBUTING.md)
changed: [head1] => (item=compute.yml)
changed: [head1] => (item=ansible.cfg)
changed: [head1] => (item=services.yml)
changed: [head1] => (item=infrastruktur.yml)
changed: [head1] => (item=README.md)
changed: [head1] => (item=local.yml)
changed: [head1] => (item=bmc.yml)
changed: [head1] => (item=requirements.yml)
changed: [head1] => (item=heads.yml)
changed: [head1] => (item=interactive.yml)
changed: [head1] => (item=site.yml)
changed: [head1] => (item=global.yml)
changed: [head1] => (item=vault-keyring-client.py)
changed: [head1] => (item=Makefile)
changed: [head1] => (item=hosts)

These are the files in my top-level ansible directory from where I run my top-level playbooks:

$ ls
ad-hoc             bmc.yml          filer.yml   heads.yml  infrastruktur.yml  Makefile          roles         test.yml
ansible.cfg        compute.yml      global.yml  host_vars  interactive.yml    README.md         services.yml  vault-keyring-client.py
ANSIBLE_RUNLOG.md  CONTRIBUTING.md  group_vars  hosts      local.yml          requirements.yml  site.yml

$ ansible-playbook infrastruktur.yml -l head1 -t prometheus-main -C

So there seems to be something wrong with the globbing happening with prometheus_alert_rules_files and the Copy custom alerting rule files task. The diff from 0.17.0 to 0.22.0 (git diff 0.17.0..0.22.0 roles/prometheus/tasks/configure.yml):

 - name: Copy custom alerting rule files
   ansible.builtin.copy:
     src: "{{ item }}"
     dest: "{{ prometheus_config_dir }}/rules/"
-    owner: root
+    owner: "{{ prometheus_system_user }}"
     group: "{{ prometheus_system_group }}"
     mode: 0640
     validate: "{{ _prometheus_binary_install_dir }}/promtool check rules %s"
-  with_fileglob: "{{ prometheus_alert_rules_files }}"
+  loop: "{{ prometheus_alert_rules_files | map('ansible.builtin.fileglob') | flatten }}"
   when:
     - not prometheus_agent_mode
   notify:
     - reload prometheus
+  become: true
+  tags:
+    - prometheus
+    - configure
+    - prometheus_configure

This was done in 1e4e4c34156900d427a65430cd3eba805b441851 before the 0.20.0 release.

gardar commented 2 weeks ago

The prometheus_alert_rules_files variable should be a list.

# config
prometheus_alert_rules_files:
  - roles/prometheus-eve/files/rules/*.yml

Not sure why the argument spec validation isn't catching this, it was also reported in #444

When you run your playbook, do you see a task named Validating arguments against arg spec 'main' being executed?

wookietreiber commented 2 weeks ago

Thanks, changing to:

prometheus_alert_rules_files:
  - roles/prometheus-eve/files/rules/*.yml

... fixed the issue:

TASK [prometheus.prometheus.prometheus : Copy custom alerting rule files] ***
changed: [head1] => (item=roles/prometheus-eve/files/rules/memory.yml)
changed: [head1] => (item=roles/prometheus-eve/files/rules/slurm.yml)
changed: [head1] => (item=roles/prometheus-eve/files/rules/storage.yml)
changed: [head1] => (item=roles/prometheus-eve/files/rules/time.yml)
changed: [head1] => (item=roles/prometheus-eve/files/rules/spectrum-scale.yml)
changed: [head1] => (item=roles/prometheus-eve/files/rules/network.yml)
changed: [head1] => (item=roles/prometheus-eve/files/rules/cpu.yml)

With my previous config, my mistake wasn't caught:

TASK [prometheus.prometheus.prometheus : Validating arguments against arg spec 'main' - Installs and configures prometheus] ***
ok: [head1]
wookietreiber commented 2 weeks ago

I figure, the check/validation might be incomplete, because it's skipping:

TASK [prometheus.prometheus.prometheus : Alert when alert rules files are found in the old .rules format] ***
skipping: [head1]
- name: Alert when alert rules files are found in the old .rules format
  ansible.builtin.debug:
    msg: >
      prometheus_alert_rules_files contains rules in the old .rules file format.
      This format has been deprecated in favour of the new YAML format since Prometheus v2.x.
      You can update your rules using promtool: promtool update rules <filenames>

      If your rules are already in the YAML format but with the .rules extension, you can safely ignore this message,
      or rename the files to <filename>.yaml or <filename>.yml to get rid of it.
  when: prometheus_alert_rules_files | select('search', '.rules$') | list | length > 0

It looks only for matches ending in .rules, not if the variable has a list format. I suggest adding this check:

- name: Validate that prometheus_alert_rules_files is a list
  ansible.builtin.assert:
    that:
      - prometheus_alert_rules_files is iterable
      - prometheus_alert_rules_files is not string
    fail_msg: '`prometheus_alert_rules_files` must be a list'
    success_msg: '`prometheus_alert_rules_files` is a list'
  when:
    - prometheus_alert_rules_files is defined

Can do PR if you want.

gardar commented 2 weeks ago

That assert task was introduced in #333 and is to help users migrating from .rules to .yml and is unrelated to this.

While we could definitely add a task to validate each variable I don't think it's the right thing to do as it is the purpose of the Role argument validation which we have already in place roles/prometheus/meta/argument_specs.yml, we just need to investigate why it's not working as expected.

gardar commented 2 weeks ago

Looks like this is by design as apparently the argument validation is currently just for ensuring the variable can be converted into the type specified in the argument spec and is not for enforcing the type of the variable. But there is a open PR over at https://github.com/ansible/ansible/pull/81575 which would enforce the variable type as specified in the argument spec.

In the meantime we could probably just convert the prometheus_alert_rules_files variable to a list at the task level with something like: loop: "{{ [prometheus_alert_rules_files] | map('ansible.builtin.fileglob') | flatten }}" or loop: "{{ prometheus_alert_rules_files | list | map('ansible.builtin.fileglob') | flatten }}"