Open brumer opened 5 years ago
@patrick-east I don't see an issue why we couldn't expose the various rego metrics via Prometheus in addition to the server handler. Thoughts?
Seems like it shouldn't be a problem. We might want to flesh out the design a little bit for how we structure the metric(s). I could see different use cases where people care about different things... for example, do we keep a histogram/summary for each API endpoint/query that is evaluated? or Aggregate them all? Someone just wanting to monitor the OPA instance may want to see the aggregate, but someone authoring/managing the policies might care about individual ones.
I think a starting point would be to extend the existing /data and unversioned POST endpoints to include the rego latencies. In many deployments only a single endpoint will actually get used. I'm partial to the unversioned POST endpoint since it avoids the problem of configuring the client with the name of the decision to request.
Removing this from TODO since we don't have any plans to work on this soon.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
Expected Behavior
Performance metrics that can be requested on individual API calls should be exposed also as Prometheus metrics
Actual Behavior
Exposed only on individual API calls when specifying the metrics=true query parameter