ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.06k stars 5.59k forks source link

[serve] Add a default route to ping the `check_health` method of an application #33979

Open edoakes opened 1 year ago

edoakes commented 1 year ago

We already have a way to configure a built-in health check within the cluster. We should expose this by default under a sub-route (like /{route_prefix}/-/healthz). This would call the check_health method (or do nothing by default) and return something simple like "ok".

Without this many users need to reimplement it themselves.

edoakes commented 1 year ago

We should also implement it in the HTTP proxy so users can have a cluster-level and per-application health check.

edoakes commented 1 year ago

cc @sihanwang41 @akshay-anyscale -- this is a very small item we can do in conjunction with health checking the proxies...

shrekris-anyscale commented 1 year ago

@edoakes I have a couple questions on this issue:

Do we want to provide an endpoint on each individual replica that calls its health check method? Or do we want to provide an endpoint per deployment on the HTTP Proxy that calls all the deployments' replicas' health check methods?

We should also implement it in the HTTP proxy so users can have a cluster-level and per-application health check.

What do you mean by cluster-level and per-application health check?

edoakes commented 1 year ago

We just want to have a default route per-application of: /{route_prefix/healthz/ that calls the check_health method of the ingress deployment.

Scratch the "HTTP proxy" part, that is already there /-/healthz