jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 220 forks source link

Prometheus http and kernel startup/shutdown metrics #1377

Open MaicoTimmerman opened 3 months ago

MaicoTimmerman commented 3 months ago

This MR is taking on #731.

The jupyter_server already ships with a PrometheusMetricsHandler, however, that handler inherits from JupyterHandler, which requires authentication. The enterprise gateway project doesn't integrate with that method of authentication, therefore I added separate handler to serve the metrics.

I've included 3 metrics in the initial implementation:

In terms of configuration, I've included the EG_METRICS_PREFIX environment variable for now, similar to how other configurations for the process proxies are set.

Sample responses from the /metrics endpoint
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.005",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.01",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.025",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.05",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.075",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.1",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.25",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.5",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.75",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="1.0",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="2.5",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="5.0",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="7.5",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="10.0",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="+Inf",method="POST",status_code="200"} 1.0
enterprise_gateway_http_request_duration_seconds_count{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",method="POST",status_code="200"} 1.0
enterprise_gateway_http_request_duration_seconds_sum{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",method="POST",status_code="200"} 19.83746862411499
welcome[bot] commented 3 months ago

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

MaicoTimmerman commented 3 months ago

Read the docs build failed due to timeouts:

    RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/free/noarch/repodata.json]
    Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds