Open sunshineo opened 4 years ago
It sounds reasonable for me. Are willed to tackle this one?
I think it's a good PR but this is definitely a breaking change and should be denoted clearly in the release notes.
Currently, calling /parse/health
is an endpoint that could be in use to tell whether parse server is reachable via a given domain, for connection string failover. So the information derived from the endpoint response is slightly different when it also depends on the DB connection string health.
New info I learned: for K8, there is a liveness probe and a readiness probe. The liveness probe should be current /parse/health . It only means if the docker is responsible or not. And if not, K8 will restart the docker. But a readiness probe should include if there is a proper database connection because if that is true, K8 will send traffic to it.
So we should have a separated url like /parse/db-health
Good point. Is there a scenario in which the DB is not ready immediately and will become ready after some time? Does parse server support a DB connection that is not ready on start-up?
What do you think about adding a general /parse/ready
endpoint? I am thinking about a cache system or other modules that may need to become ready for parse server to function. So all these modules could be summarized in a single ready
endpoint, and it's easily extendable.
Thank you @mtrezza . I'll try to create a ready endpoint starting with just db and later we can add more. I plan to just do a query on the _User collection with count=0. I think it will work but let me know if there is a better way
It depends on how you define “ready” for the DB, which has itself a status endpoint.
For example:
The driver has a isConnected
method to determine whether a connection between driver and DB is established.
https://mongodb.github.io/node-mongodb-native/3.5/api/MongoClient.html#isConnected
The server status, which may be a good middle ground for your ready
endpoint.
https://mongodb.github.io/node-mongodb-native/3.5/api/Admin.html#serverStatus
The replica set status, where you determine the individual node status. Just because you can execute a query, doesn’t mean the replica set is fully up and running and ready for what you consider your modus operandi. Maybe a separate PR in the future, if someone wants to get into these details.
https://mongodb.github.io/node-mongodb-native/3.5/api/Admin.html#replSetGetStatus
Sent with GitHawk
Thank you @mtrezza . I'll try to create a ready endpoint starting with just db and later we can add more. I plan to just do a query on the _User collection with count=0. I think it will work but let me know if there is a better way
I'd do a query in the schema table.
I did some research and came across two concepts for these endpoints. Maybe it can give you some inspiration.
1) Accumulative endpoints
/health/live
- The application is up and running.
/health/ready
- The application is ready to serve requests.
/health
- Accumulating all health check procedures in the application.
https://quarkus.io/guides/microprofile-health
2) Separate endpoints
GET /-/health Only checks whether the application server is running It does not verify the database or other services are running. A successful response will return a 200 status code
GET /-/readiness Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either.
GET /-/liveness Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted.
GET /-/metrics Node process metrics Infrastructure to collect internal metrics (like southbound APIs) API to retrieve the collected metrics and to reset them.
https://www.npmjs.com/package/@ozawa/express-health-check-middleware
I believe option 1 is good enough for now but option 2 would be really great. I have no idea how we could ensure all those states in option 2 though.
I think we should decide for a long-term option now, because if we want to change this in a few months, we may have a breaking change and things get more complex.
/health
as summary endpoint (everything is running and ready)./health
as a single-aspect endpoint (parse server is running, but maybe not ready).What is more common in 3rd party systems to ensure compatibility and practicability for this endpoint?
Looking around my k8s cluster, this is what prometheus uses:
Liveness: http-get http://:web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6
Readiness: http-get http://:web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=120
and this is what ambassador uses:
Liveness: http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=3s #success=1 #failure=3
Readiness: http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=3s #success=1 #failure=3
So IMHO focusing on ready, vs. live, where live probe avoids accidental restarts and ready probe serves as a signal to start routing traffic sounds better then to focus on overall health vs db health.
This is a great resource where I think we should aim to go: https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes
@mman Great research, makes absolutely sense. If I understand it correctly, it’s option 2, right?
Sent with GitHawk
Yes, option 2, but also potentially changing names of the endpoints to better indicate their purpose. /healthy
+ /ready
as prometheus uses sounds logical to me, perhaps keeping the legacy /health
around pointing towards /healthy
as well for compatibility reasons.
Is your feature request related to a problem? Please describe. Deployed Parse server using k8 and used /parse/health for the readinessProbe and livenessProbe probe. But when deploy new code and broke the database connection, /parse/health still returns 200 OK. So old container was killed and new broken one brought online
Describe the solution you'd like Only return 200 ok after db is connected on /parse/health or /parse/db-health
Describe alternatives you've considered Tried to create a cloud function that checks db connection but that requires POST and have application-id in the header, which k8 cannot do
Additional context Related to https://github.com/parse-community/parse-server/issues/4575