ory / keto

The most scalable and customizable permission server on the market. Fix your slow or broken permission system with Google's proven "Zanzibar" approach. Supports ACL, RBAC, and more. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.
https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=keto
Apache License 2.0
4.86k stars 345 forks source link

Prometheus metrics not working #1611

Open eroznik opened 3 weeks ago

eroznik commented 3 weeks ago

Preflight checklist

Ory Network Project

No response

Describe the bug

When requests are executed against the Keto instance, Prometheus metrics do not change as expected.

As an example, the check endpoint metrics after 7 invocation results as:

... ❯❯❯ curl http://localhost:4466/metrics/prometheus | grep http_requests_duration_seconds_bucket | grep "/relation-tuples/check"                                                                                                                            
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 83104    0 83104    0     0  39.6M      0 --:--:-- --:--:-- --:--:-- 39.6M
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.005"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.01"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.025"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.05"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.1"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.25"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="0.5"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="1"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="2.5"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="5"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="10"} 7
http_requests_duration_seconds_bucket{app="keto",buildTime="undefined",code="403",endpoint="/relation-tuples/check",hash="undefined",method="post",version="master",le="+Inf"} 7

The expectation is that measures would be different from bucket-to-bucket as there is slim to no chance that all requests can be the same value.. Latency varies from request to request.

Reproducing the bug

  1. start Keto
  2. execute some test requests, e.g. http POST http://localhost:4466/relation-tuples/check namespace=Test
  3. check metrics for this endpoint curl http://localhost:4466/metrics/prometheus | grep http_requests_duration_seconds_bucket | grep "/relation-tuples/check"

Relevant log output

on/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:43188 scheme:http] http_response=map[headers:map[content-type:application/json; charset=utf-8] size:18 status:403 text_status:Forbidden took:760.507203ms]
INFO[2024-10-28T15:29:25+01:00] started handling request                      http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:43192 scheme:http]
INFO[2024-10-28T15:29:26+01:00] completed handling request                    http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:43192 scheme:http] http_response=map[headers:map[content-type:application/json; charset=utf-8] size:18 status:403 text_status:Forbidden took:999.676135ms]
INFO[2024-10-28T15:29:27+01:00] started handling request                      http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55088 scheme:http]
INFO[2024-10-28T15:29:27+01:00] completed handling request                    http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55088 scheme:http] http_response=map[headers:map[content-type:application/json; charset=utf-8] size:18 status:403 text_status:Forbidden took:208.008314ms]
INFO[2024-10-28T15:29:28+01:00] started handling request                      http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55098 scheme:http]
INFO[2024-10-28T15:29:29+01:00] completed handling request                    http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55098 scheme:http] http_response=map[headers:map[content-type:application/json; charset=utf-8] size:18 status:403 text_status:Forbidden took:401.00658ms]
INFO[2024-10-28T15:29:29+01:00] started handling request                      http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55114 scheme:http]
INFO[2024-10-28T15:29:30+01:00] completed handling request                    http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55114 scheme:http] http_response=map[headers:map[content-type:application/json; charset=utf-8] size:18 status:403 text_status:Forbidden took:970.554767ms]
INFO[2024-10-28T15:29:31+01:00] started handling request                      http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55116 scheme:http]
INFO[2024-10-28T15:29:32+01:00] completed handling request                    http_request=map[headers:map[accept:application/json, */* accept-encoding:gzip, deflate connection:keep-alive content-length:21 content-type:application/json user-agent:HTTPie/1.0.3] host:localhost:4466 method:POST path:/relation-tuples/check query:<nil> remote:127.0.0.1:55116 scheme:http] http_response=map[headers:map[content-type:application/json; charset=utf-8] size:18 status:403 text_status:Forbidden took:922.347091ms]

Relevant configuration

No response

Version

All versions

On which operating system are you observing this issue?

Linux

In which environment are you deploying?

Kubernetes with Helm

Additional Context

No response