Closed hnb2 closed 1 week ago
we had to intentionally do this because when /healthz
wasn't returning a 200 during migrations, and when the migrations were taking a long time, the orchestrator (like kubernetes) would mark the instance as unhealthy and kill the instance before migrations could finish.
This led to the instance never managing get started.
We understand that this is an anti-pattern, but until we have a better way to prevent kubernetes from killing instances during long running migrations, we can't change this.
Hi @netroy Thank you for the quick answer.
Do you suggest instead we monitor the home page (/) for something that is not 200 ?
I saved a curl response during the incident and we had this:
< HTTP/2 503
(keeping in my mind what you are saying about migrations)
We could add a /health/db
endpoint that only returns 200 when the DB is ready. Would that help?
Hi @netroy sounds like a good idea.
Just adding the internal reference for this issue which is N8N-7547
Fix got released with n8n@1.58.0
Fix got released with n8n@1.58.0
Bug Description
Hi there, we are monitoring the status of our N8N instance through the /healthz endpoint, but the instance went down several time and we always got
{"status": "ok"}
with a 200. While if you go to the home page(just "/") you will see a stacktrace with a 503 error.Stacktrace:
Logs on stdout:
Immediately after the error, no jobs at all were being handled and they were all lost.
Thank you for your help, please let me know if you need any other details.
To Reproduce
Im not sure how to intentionally reproduce this, probably by shutting down the DB temporarily ? We are using Postgres 15 if it makes any difference.
Right after the incident I checked the metrics on the db and the application server and everything was fine, no overload or anything suspicious.
Expected behavior
I would expect the /healthz endpoint to return a different body and status to indicate the failure so our alerting can do its job.
Operating System
Linux
n8n Version
1.51.2
Node.js Version
18
Database
PostgreSQL
Execution mode
queue