jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
621 stars 222 forks source link

Metric Collection and Montioring #731

Open IMAM9AIS opened 5 years ago

IMAM9AIS commented 5 years ago

Hi,

We have been trying to use JEG in our Production systems and with time it is becoming increasingly necessary for us to collect and monitor the metric around the kernels being spawned, users using them, and the kind of requests being made to the JEG servers.

This being said, can we receive any guidance on how we should proceed with the aggregation of these metrics from the JEG servers for it be logged for monitoring purposes. To start with, we are thinking of incorporating the collection and monitoring of these metrics through "STATSD" library. https://pypi.org/project/pystatsd/

I am not sure if this really qualifies to be an issue, but this surely can be a feature add with this being the starting point.

We are looking to collect following generic information around the setup.

RPS on JEG.

kevin-bates commented 5 years ago

@IMAM9AIS - this would be fantastic! This seems to imply that we'd want to have our own handlers in place since some of these probably warrant updates to those locations - although I suppose that could be a discussion point.

With the persistent kernel session stuff, we already track kernels per user and can get total active kernels and users.

I don't know how much overlap there is with pystatsd, but I think it would be good to take a look at the telemetry stuff (event logging) that is underway in a couple other Jupyter projects (Hub and Lab) from a synergy perspective. On the surface, that appears to be more of an auditing thing than metrics. That said, there are other metric pieces (via prometheus) in place in various projects as well. I just want to make sure we're not adding yet another framework to the ecosystem when others exist and are adequate for our needs.

I hope that's helpful.

esevan commented 5 years ago

there are other metric pieces (via prometheus) in place in various projects as well.

I love prometheus, too :+1:

IMAM9AIS commented 5 years ago

@kevin-bates @esevan Sounds good. We actually came across this PR that was added to notebook server to use Prometheus to push metrics.
https://github.com/jupyter/notebook/pull/3490

However, while using JEG, this PR does not seem to be enabled in JEG. We are trying to understand if we can actually use this PR to extend our solution and add more metrics to this.

kevin-bates commented 5 years ago

If you move to the master branch (where we've removed EG's dependency on Kernel Gateway), you should have the ability to get the /metrics endpoint exposed. I suspect this would consist of the similar approach used in https://github.com/jupyter/enterprise_gateway/blob/master/enterprise_gateway/base/handlers.py where the various mixins get added into the class derivation and the handler then essentially derives from Notebook's PrometheusMetricsHandler - similar to all the other handlers.