Open janhorstmann opened 7 months ago
@berendt
I am unsure how to approach this in the best way. My intention was to remove the hardcoded rules in the testbed and replace them with rules from kolla-operations
, so that they may be updated in one place and always fit the kolla release in use, since exporters and alerting practices might change between releases.
kolla-operations
is currently included in the kolla-ansible
container, but is not, to the best of my knowledge, deployed from there. I also see no way to archieve that in a way which does not interfere with user provided overlays.
One, incomplete, option would be to bundle the rules in the prometheus container, as is already done for the dashboards and the grafana container. The rules inside the container (e.g.: /operations
) could then be referenced in a promtheus config overlay, eg.: environments/kolla/files/overlays/prometheus/prometheus.yml.d/50-kolla-operations.yml
---
rule_files:
- /operations/prometheus/*.rules
This approach however ignores the config extensions for prometheus, which are also part of kolla-operations
.
Another approach I thought about was adding a new config option to the cookiecutter and then conditionally checkout the kolla-operations
repository in a pre_gen
hook. This however would leave the task of updating the rules to the user during upgrades.
Since the testbed may be used to deploy different versions of OSISM, this would not be flexible enough for automatic deployment of prometheus rules.
One could also use gilt
to automatically checkout and update the rules. Running gilt
is a step always required during upgrades anyway, but this would probably add a step to set the version of kolla-operations
. It would also probably require a different versioning scheme with stable branches for different kolla releases.
Right now I am inclined to add the rules to the kolla prometheus container and extend the cookiecutter with an extension yaml referencing the rules as shown above.
It was decided to patch kolla-ansible
and add tasks to deploy kolla-operations
from there
The testbed currently has its own prometheus rules in
environments/kolla/files/overlays/prometheus
. These rules are not synced with the rules provided in kolla-operations and at least in the case ofContainerVolumeIoUsage
faulty (mixup of absolute and relative values).The testbed (and possibly osism in general) should deploy the rules defined in kolla-operations.