microsoft / service-fabric-aspnetcore

This repo contains ASP.NET Core integration for Service Fabric Reliable Services.
Other
152 stars 49 forks source link

SF Reverse Proxy and AspNetCore Healthchecks that return a 503 #70

Closed NArnott closed 5 years ago

NArnott commented 5 years ago

Is your feature request related to a problem? Please describe. I'm implementing health checks that were introduced in AspNetCore 2.2. If a health check produces an unhealthy result, the resulting http health check returns a 503.

However, when accessing my service through the SF reverse proxy, 503 causes an endless retry loop and the client never gets a response.

Describe the solution you'd like When a true 404 is returned, the SF reverse proxy integration middleware forces a special X-ServiceFabric: ResourceNotFound header to hint to the reverse proxy of that case. It would be nice if we could do a similar hint, such as X-ServiceFabric: DisableRetry to ask the reverse proxy to not attempt a retry.

amanbha commented 5 years ago

503 is service unavailable so Reverse Proxy re-resolves. Does your service keep on returning 503 once it gets unhealthy result and never returns back to normal operation. Adding the new value to header would need a corresponding change in ReverseProxy to handle it. Can the existing "ResourceNotFound" value be used for this.

NArnott commented 5 years ago

The health check url, for example, would be GET http://mydomain/health. I want this to return 503. I don't want the Reverse Proxy to retry it, as that kind of negates the reason for it in the first place. For example, nginx could hit this endpoint to determine if the service is down and stop sending requests to that service. Right now, the call to /health will just hang because the SF reverse proxy is constantly retrying.

Second problem this introduces is that each call to /health runs several health checks, such as the status of SQL connections, dependent services, etc. Since the SF reverse proxy is constantly retrying, those downstream calls are also getting called endlessly.

amanbha commented 5 years ago

Thanks for explaining your scenario, Is 503 retuned by your code or its the error code returned by aspnetcore? 503 http code means service unavailable and that's why ReverserProxy is retrying.

@kavyako Adding the new value to header would need a corresponding change in ReverseProxy to handle it. Can the existing X-ServiceFabric: ResourceNotFound header value be used with 503 to disable resolves OR X-ServiceFabric: ResourceNotFound header is only valid with 404?

NArnott commented 5 years ago

The 503 is returned by the default AspNetCore Health Checks implementation (https://github.com/aspnet/Diagnostics/blob/master/src/Microsoft.AspNetCore.Diagnostics.HealthChecks/HealthCheckOptions.cs).

I understand why the reverse proxy is doing what it's doing, but in this case, it doesn't make sense for it to retry, and the goal of a health check is to know the current state of the service, without retries.

amanbha commented 5 years ago

@NArnott I discussed this offline with @kavyako and we will add this to ReverserProxy (for our future SF runtime release) to handle the case for 503, the exact header details is still TBD at the moment. I was thinking that to unblock you for now, could you add a middleware which detects this and return 404 with X-ServiceFabric: ResourceNotFound so that ReverseProxy doesn't do a reresolve on it forever, I understand its less than ideal, but if its a blocker for your scenario, you can try it.

NArnott commented 5 years ago

@amanbha Thanks for the following up, and am looking forward to the new feature.

I'm thinking for now I'll just adjust the health checks to return a 200 as well, and I'll pass something in the body of the response that reveals the true status. A bit hacky, but should work for the short term.

amanbha commented 5 years ago

@NArnott This feature has been added to Sf ReverseProxy, it will not try to reresolve again when header X-ServiceFabric=NoRetry is present. This is available starting with runtime version 6.4.658