levibostian / ExpressjsBlanky

Blank Express.js project to get up and running FAST.
MIT License
7 stars 0 forks source link

Healthchecks should only be for DBs and other critical checks. #27

Closed levibostian closed 4 years ago

levibostian commented 4 years ago

I have an app deployed on k8s. I experienced some downtime recently for my app.

At the same time that the downtime happened, I received a honeybadger report. what was happening was this:

  1. A service I use, Postmark, was unreachable via it's API when my app sent a ping during a healthcheck.
  2. The healthcheck failed which resulted in an exception being thrown which caused the app to crash. The app is setup at this time that when there are any exceptions thrown in the app not caught, the app simply crashes so it can restart.
  3. k8s performed a healthcheck on all of the pods. Including the new pods that got created when the original pods crashed.
  4. k8s had all of the app's pods crash and go down which resulted in downtime.

Expected outcome

Healthchecks return false but do not crash. Then k8s will not direct traffic to that pod.

I might want to consider everwhere else in my app this could happen. When an exception happens in the app, do we want the app to restart? Is that intended behavior? Restarting should happen only when an error is uncaught which means it was not handled. We need to have more caught exceptions which return 500 and not crash the pod to prevent downtime. Downtime is bad because then all other endpoints from other clients cannot communicate!

levibostian commented 4 years ago

After further inspection, the 503 response is sent by terminus. I thought that nginx ingress sent the 503 when the pods were down. I did not realize this response is sent.

This makes me think, however, Postmark healthcheck might be good for a readiness check when the app is starting up to make sure that the client is setup correctly but it's not a core part of the app such as a redis or postgres DB. Let's move the postmark check to the readiness check when the app starts up.

levibostian commented 4 years ago

Done