jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
623 stars 222 forks source link

Feature Request: Health check for enterprise gateway #697

Open esevan opened 5 years ago

esevan commented 5 years ago

This is a feature request.

As EG becomes one of core mircroservices in scalable jupyter deployment, Reliability should be required to EG.

There're many works for reliability such as HA support, and session persistence, but I think the easiest way is to recover to desired status by restarting EG in crash.

If EG provides its liveness status via /healthz endpoint, we can easily diagnose the status of EG and restart it when it's not healthy.

Of course, industrial enterprise cluster provides great automation of recovering interface like Kubernetes Container Probes

If folks thumb up to this idea, I want to discuss about what and how unhealthy status can be tracked in EG.

achandak123 commented 3 years ago

@esevan , i am interested in this feature request. Are you working on it, or this needs to be discussed?

esevan commented 3 years ago

@achandak123 , Hi Amit

Unfortunately, I'm not working on it and I don't think I can handle this since I'm working on anther project now. I'd appreciate it if someone on this thread could contribute to the feature.

kevin-bates commented 2 years ago

A liveness check can be performed using a GET against /api. This will return a JSON consisting of the version string of the underly Jupyter Server, and a version string of the current EG instance.

That said, I'd like to leave this open for further discussion. If the above is sufficient, I'm also happy to close the issue. :smile: