prometheus / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
53.7k stars 8.9k forks source link

Handle saves/deletes on /api/v1/rules #10314

Open gillg opened 2 years ago

gillg commented 2 years ago

Proposal

Is your proposal related to a problem?

Grafana since v8 introduced a very useful alerting UI for end users. They rely in background on an internal alertmanager not able to receive alerts from outside, native Alertmanager APIs to manage a remote instance, and some Cortex metrics APIs to manage all alerting from 1 place.

Using Grafana to evaluate alerting rules is a SPOF and late in alerting stack, so you can also use this API natively https://cortexmetrics.io/docs/api/#set-rule-group Is there any plans to implement this endpoint ?

Describe the solution you'd like

The best would be to handle POST and DELETE verbs on the existing rules API /api/v1/rules. We should be able to configure prometheus config file to check rules in a folder like /etc/prometheus/remote-rules.d/*.yml At command-line args we could have --rules.remote-management.store-path=/etc/prometheus/remote-rules.d/ then one yaml file is created by "namespace" (see cortexmetrics API doc)

Describe alternatives you've considered

The only homemade alternative is a reverse proxy with handles POST / DELETEs on /api/v1/rules and a tool in charge to create files in a /etc/prometheus/remote-rules.d/*.yml folder

This proposal partialy duplicates https://github.com/thanos-io/thanos/issues/5168 but both are usefull if you want to dispatch alerts at different levels.

roidelapluie commented 2 years ago

hello,

thanks for your message. I plan to propose optionally backing up Prometheus config with a consistent storage, e.g. etcd, at the next dev summit. We'll see what happens then but it could be a first step to address this issue.

gillg commented 2 years ago

Great ! That's true it could be better for a k8s model where write config files will not persist...

Why not use a dedicated rules system folder in the data dir as first approach also ? We know data will persist in any architecture. And if for any reason it's not the case, you just use regular configs and not "remote rules management".

An additional flag --alerts.remote-management-enabled=true would be useful

roidelapluie commented 2 years ago

I'd like to avoid mangling files on disk. If we were to support something like this, it would be hard to remove it later.

I think it makes more sense to go directly to a database rather than letting the users access the local filesystem.

DrAuYueng commented 2 years ago

Sounds good.

BlueBlue-Lee commented 1 year ago

I'd like to know is there any roadmap to support post/put/get/delete api for rules in Prometheus natively? Managing rule files is kind of hard and inconvenient.

@roidelapluie