Closed rogoman closed 6 years ago
One possible fix would be for the marathon.v2.Api
object to also look at the app/healthChecks
property in the JSON returned from v2/apps/[appId]
to determine if an app has any health checks configured at all. If so, a task should be excluded from the load-balancing pool if the healthCheckResults
property is an empty array.
Thanks for filing this @rogoman! If you are up for it, we would be happy to review a PR with the change you described above? We love receiving PRs from the community.
@dadjeibaah PR ready: https://github.com/linkerd/linkerd/pull/2099
Issue Type:
What happened: I have linkerd configured with the
useHealthCheck
flag set totrue
. I scaled up one of the apps in my DC/OS cluster. Its health-check grace period is set to 60 seconds, this is because it takes quite long for this app to start up. Unfortunately, theuseHealthCheck
switch only affects apps that have a failed health check, not those that still have an unknown state, so Linkerd routed requests to the new instance of my app, even though it wasn't ready.What you expected to happen: No requests should be routed to a service until Marathon marks it as healthy. Instances in a health check grace period should not receive any requests.
How to reproduce it (as minimally and precisely as possible): Configure a marathon namer with
useHealthCheck:true
. Run a Marathon app with a long health-check grace period. Keep sending requests to the app via linkerd. Observe what happens when you scale up such app.Anything else we need to know?:
Environment: