tilt-dev / tilt

Define your dev environment as code. For microservice apps on Kubernetes.
https://tilt.dev/
Apache License 2.0
7.67k stars 301 forks source link

Cluster health check failure can get stuck #5729

Open milas opened 2 years ago

milas commented 2 years ago

Expected Behavior

Current Behavior

Steps to Reproduce

This is a recent feature and we've only had this reported once via Slack, but the error was showing an error on the /livez check, and the user reported that request was succeeding via curl at that point.

Screen Shot 2022-04-22 at 9 47 08 AM

They'd mentioned getting into the state after having put their laptop to sleep for the day and returning the next morning.

andymartin-sch commented 2 years ago

a few more developers of ours saw this recently - it would be nice to improve this because getting stuck here seems like a regression caused by the (otherwise great) health check functionality being added

milas commented 2 years ago

@andymartin-sch Thanks for the extra reports - agreed this is not the experience we want here; I'm hoping to include at least some form of remediation in our release today.

In the cases you've seen, has the error shown in the Tilt UI been similar to that in the issue above? If so, do you know if anyone tried manually accessing the endpoint (e.g. curl https://..../livez) and whether that was successful?

andymartin-sch commented 2 years ago

In the cases you've seen, has the error shown in the Tilt UI been similar to that in the issue above?

yeah pretty much the exact same

If so, do you know if anyone tried manually accessing the endpoint (e.g. curl https://..../livez) and whether that was successful?

I don't think so but we can do that going forward and will let you know - thanks!!

andymartin-sch commented 2 years ago

ah, one developer just said:

When I hit this, I went to that endpoint in my browser and it returned “ok”

milas commented 2 years ago

A couple improvements/fixes went into v0.29.0 (released May 6) - please let me know if you still see the issue after upgrading!