osism / issues

This repository is used for bug reports that are cross-project or not bound to a specific repository (or to an unknown repository).
https://www.osism.tech
1 stars 1 forks source link

Replace testbed prometheus rules with rules from kolla-operations #1012

Open janhorstmann opened 7 months ago

janhorstmann commented 7 months ago

The testbed currently has its own prometheus rules in environments/kolla/files/overlays/prometheus. These rules are not synced with the rules provided in kolla-operations and at least in the case of ContainerVolumeIoUsage faulty (mixup of absolute and relative values).

The testbed (and possibly osism in general) should deploy the rules defined in kolla-operations.

janhorstmann commented 6 months ago

@berendt

I am unsure how to approach this in the best way. My intention was to remove the hardcoded rules in the testbed and replace them with rules from kolla-operations, so that they may be updated in one place and always fit the kolla release in use, since exporters and alerting practices might change between releases.

kolla-operations is currently included in the kolla-ansible container, but is not, to the best of my knowledge, deployed from there. I also see no way to archieve that in a way which does not interfere with user provided overlays.

One, incomplete, option would be to bundle the rules in the prometheus container, as is already done for the dashboards and the grafana container. The rules inside the container (e.g.: /operations) could then be referenced in a promtheus config overlay, eg.: environments/kolla/files/overlays/prometheus/prometheus.yml.d/50-kolla-operations.yml

---
rule_files:
  - /operations/prometheus/*.rules

This approach however ignores the config extensions for prometheus, which are also part of kolla-operations.

Another approach I thought about was adding a new config option to the cookiecutter and then conditionally checkout the kolla-operations repository in a pre_gen hook. This however would leave the task of updating the rules to the user during upgrades. Since the testbed may be used to deploy different versions of OSISM, this would not be flexible enough for automatic deployment of prometheus rules.

One could also use gilt to automatically checkout and update the rules. Running gilt is a step always required during upgrades anyway, but this would probably add a step to set the version of kolla-operations. It would also probably require a different versioning scheme with stable branches for different kolla releases.


Right now I am inclined to add the rules to the kolla prometheus container and extend the cookiecutter with an extension yaml referencing the rules as shown above.


It was decided to patch kolla-ansible and add tasks to deploy kolla-operations from there