grpc / grpc-dotnet

gRPC for .NET
Apache License 2.0
4.19k stars 769 forks source link

gRPC requests intermittently taking 1 min 40 #2320

Open jameswoodley opened 11 months ago

jameswoodley commented 11 months ago

We've recently added gRPC to our system, hosted in Azure using a Linux App Service. It seems intermittently we get hit with a client hitting our timeout (1min 40). DataDog has tracing and logging enabled, and we get next to no information:

image image

The exception thrown on the server is:

System.IO.IOException: The request stream was aborted.
 ---> Microsoft.AspNetCore.Connections.ConnectionAbortedException: Reading the request body timed out due to data arriving too slowly. See MinRequestBodyDataRate.
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.CopyPipeAsync(PipeReader reader, PipeWriter writer)
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2Connection.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
   --- End of inner exception stack trace ---
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2MessageBody.ReadAsync(CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
   at Grpc.AspNetCore.Server.Internal.PipeExtensions.ReadSingleMessageAsync[T](PipeReader input, HttpContextServerCallContext serverCallContext, Func`2 deserializer)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.UnaryServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)

The client throws:

Grpc.Core.RpcException: Status(StatusCode="Unknown", Detail="Bad gRPC response. HTTP status code: 524")
   at FindYourGrind.Auth.Sdk.Users.Users.GetUserById(Guid userId, CancellationToken cancellationToken)
   at FindYourGrind.Auth.Sdk.Users.Users.<>c__DisplayClass12_0.<<IsUserActive>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at LazyCache.CachingService.<>c__DisplayClass28_1`1.<<GetOrAddAsync>b__1>d.MoveNext()
--- End of stack trace from previous location ---
   at LazyCache.CachingService.GetOrAddAsync[T](String key, Func`2 addItemFactory, MemoryCacheEntryOptions policy)
   at FindYourGrind.Auth.Sdk.Users.Users.IsUserActive(Guid userId, CancellationToken cancellationToken)
   at B2B.Educator.Engine.Infrastructure.Identity.JwtValidatorHelper.OnTokenValidatedHelperV2(IFygAuthSdk authSdk, ClaimsPrincipal claimsPrincipal, Action`1 failure, Action success) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Infrastructure/Identity/JwtValidatorHelper.cs:line 184
   at B2B.Educator.Engine.Infrastructure.Identity.JwtValidatorHelper.OnTokenValidatedHelper(TokenValidatedContext tvc) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Infrastructure/Identity/JwtValidatorHelper.cs:line 53
   at Microsoft.AspNetCore.Authentication.JwtBearer.JwtBearerHandler.HandleAuthenticateAsync()
   at Microsoft.AspNetCore.Authentication.JwtBearer.JwtBearerHandler.HandleAuthenticateAsync()
   at Microsoft.AspNetCore.Authentication.AuthenticationHandler`1.AuthenticateAsync()
   at Microsoft.AspNetCore.Authentication.AuthenticationService.AuthenticateAsync(HttpContext context, String scheme)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at NSwag.AspNetCore.Middlewares.SwaggerUiIndexMiddleware.Invoke(HttpContext context)
   at NSwag.AspNetCore.Middlewares.RedirectToIndexMiddleware.Invoke(HttpContext context)
   at NSwag.AspNetCore.Middlewares.OpenApiDocumentMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)
   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.HandleException(HttpContext context, ExceptionDispatchInfo edi)
   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)
   at Datadog.Trace.ClrProfiler.AutoInstrumentation.AspNetCore.BlockingMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Server.IIS.Core.IISHttpContextOfT`1.ProcessRequestAsync()

Client Channel Setup:

        {
            var token = await tokenGenerator.GetApplicationToken(context.CancellationToken).ConfigureAwait(false);
            metadata.Add("Authorization", $"Bearer {token}");
        });

        var methodConfig = new MethodConfig
        {
            Names = { MethodName.Default },
            RetryPolicy = new RetryPolicy
            {
                MaxAttempts = 5,
                InitialBackoff = TimeSpan.FromSeconds(1),
                MaxBackoff = TimeSpan.FromSeconds(5),
                BackoffMultiplier = 1.5,
                RetryableStatusCodes =
                {
                    StatusCode.Unavailable, StatusCode.DeadlineExceeded, StatusCode.ResourceExhausted,
                    StatusCode.Aborted
                }
            }
        };
        _channel = GrpcChannel.ForAddress(environment.Address, new GrpcChannelOptions
        {
            HttpHandler = new SocketsHttpHandler
            {
                EnableMultipleHttp2Connections = true,
                PooledConnectionIdleTimeout = Timeout.InfiniteTimeSpan,
                KeepAlivePingDelay = TimeSpan.FromSeconds(60),
                KeepAlivePingTimeout = TimeSpan.FromSeconds(30)
            },
            Credentials = ChannelCredentials.Create(new SslCredentials(), credentials),
            ServiceConfig = new ServiceConfig { MethodConfigs = { methodConfig } }
        });

Upon disposal of the class, we call _channel.ShutDownAsync(); _channel.Dispose();

I've recently added in some retry and keep alive logic, but is there anything else we should be doing? I'm tearing my hair out with why intermittently it takes so long

jameswoodley commented 11 months ago

We also just got this on the server:

System.IO.IOException: The request stream was aborted.
 ---> Microsoft.AspNetCore.Connections.ConnectionAbortedException: The HTTP/2 connection faulted.
   --- End of inner exception stack trace ---
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2MessageBody.ReadAsync(CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
   at Grpc.AspNetCore.Server.Internal.PipeExtensions.ReadSingleMessageAsync[T](PipeReader input, HttpContextServerCallContext serverCallContext, Func`2 deserializer)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.UnaryServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)

And this on the client

Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Bad gRPC response. HTTP status code: 400")
   at Grpc.Net.Client.Internal.Retry.RetryCallBase`2.GetResponseCoreAsync()
   at FindYourGrind.Auth.Sdk.Users.Users.GetUserById(Guid userId, CancellationToken cancellationToken)
   at B2B.Educator.Engine.Application.UserInfo.Queries.Favorites.GetUserFavoritesHandler.Handle(GetUserFavorites request, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/UserInfo/Queries/Favorites/GetUserFavorites.cs:line 51
   at B2B.Educator.Engine.Application.Common.Behaviours.PerformanceBehaviour`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/PerformanceBehaviour.cs:line 59
   at B2B.Educator.Engine.Application.Common.Behaviours.ValidationBehaviour`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/ValidationBehaviour.cs:line 41
   at B2B.Educator.Engine.Application.Common.Behaviours.AuthorizationBehaviour`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/AuthorizationBehaviour.cs:line 0
   at B2B.Educator.Engine.Application.Common.Behaviours.UnhandledExceptionBehaviour`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/UnhandledExceptionBehaviour.cs:line 47
   at B2B.Educator.Engine.Application.Common.Behaviours.RetryPolicyBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/RetryPolicyBehavior.cs:line 100
   at B2B.Educator.Engine.Application.Common.Behaviours.TransactionBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/TransactionBehavior.cs:line 52
   at B2B.Educator.Engine.Application.Common.Behaviours.OutboxBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/Application/Common/Behaviours/OutboxBehavior.cs:line 0
   at MediatR.Pipeline.RequestExceptionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
   at MediatR.Pipeline.RequestExceptionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
   at MediatR.Pipeline.RequestExceptionActionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
   at MediatR.Pipeline.RequestExceptionActionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
   at MediatR.Pipeline.RequestPostProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
   at MediatR.Pipeline.RequestPreProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
   at B2B.Educator.Engine.Api.Controllers.UserInfoController.GetUserFavorites(CancellationToken cToken) in /home/runner/work/B2B-Educator-Engine/B2B-Educator-Engine/src/WebUI/Controllers/UserInfoController.cs:line 97
   at lambda_method212(Closure , Object )
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Logged|12_1(ControllerActionInvoker invoker)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
JamesNK commented 11 months ago

This is a difficult problem to solve. There are many layers between the client and the server in Azure App Service. The first set of client and server errors suggests the HTTP request reaches the server but not all the data is arriving. How big is the message the client is sending to the server?

What version of Grpc.Net.Client, Grpc.AspNetCore and .NET are you using? I recommend using the latest version of Grpc.Net.Client.

One option is to simplify your scenario. You could create a channel without any customization and see whether it works. For example:

        _channel = GrpcChannel.ForAddress(environment.Address, new GrpcChannelOptions
        {
            Credentials = ChannelCredentials.Create(new SslCredentials(), credentials)
        });

If the issue disappears after the changes, it could narrow the problem to something in the retry logic.

Another thing to try is disabling the channel's advanced connectivity features. You can do that by setting a property on the handler:

        var handler = new SocketsHttpHandler();
        handler.Properties["__GrpcLoadBalancingDisabled"] = true;
        _channel = GrpcChannel.ForAddress(environment.Address, new GrpcChannelOptions
        {
            Credentials = ChannelCredentials.Create(new SslCredentials(), credentials),
            HttpHandler = handler
        });
jameswoodley commented 11 months ago

Hey @JamesNK thanks for the reply! So the message being sent is pretty small. Protos are:

UUID:

message UUID {
  string value = 1;
}

GetUserByIdRequest

message GetUserByIdRequest {
  FindYourGrind.Grpc.UUID id = 1;
}

Versions Grpc.Net.Client: 2.59.0 Grpc.AspNetCore: 2.59.0 Dotnet: 7

In terms of the retry, the same issue was happening before the retry logic was added, it was actually added to try and resolve/mitigate this issue

I'll try the simplified scenario first, if no joy I'll disable the advanced connectivity features and report back.

With regards hosting in Azure, would you recommend a different hosting solution than Azure App Service?

jameswoodley commented 11 months ago

Unfortunately the simplified scenario hasn't worked. So moving on to disabling the advanced connectivity features

image image

Channel is now created as:

var credentials = CallCredentials.FromInterceptor(async (context, metadata) =>
{
    var token = await tokenGenerator.GetApplicationToken(context.CancellationToken).ConfigureAwait(false);
    metadata.Add("Authorization", $"Bearer {token}");
});

var handler = new SocketsHttpHandler();
handler.Properties["__GrpcLoadBalancingDisabled"] = true;
_channel = GrpcChannel.ForAddress(environment.Address, new GrpcChannelOptions
{
    Credentials = ChannelCredentials.Create(new SslCredentials(), credentials)
});

Will report back from that change

jameswoodley commented 11 months ago

Unfortunately no dice - I think for now we're going to have to go back to Web Api

JamesNK commented 11 months ago

Damn.

In one way it is good: reducing the gRPC client to its most basic behavior means the problem isn't there. It is bad because the problem is somewhere that is harder to change: HttpClient or ASP.NET Core or Azure App Service. I think the likely cause is Azure App Service and how it handles HTTP/2. It's the complex variable in this solution. Something causes the request to stop, and then the request is killed after 100 seconds.

Two interesting experiments you could make:

  1. Test whether gRPC-Web works works over HTTP/1.1. See https://learn.microsoft.com/en-us/aspnet/core/grpc/grpcweb?view=aspnetcore-8.0 (assuming you don't use streaming). That removes another layer of complexity.
  2. Or, use Web API, but see whether the problem persists if Web API is used over HTTP/2.

I'll pass this issue on to Azure App Service people to see whether they have any ideas.

btw I'm on holiday this week and back Nov 27.

jameswoodley commented 11 months ago

Hi @JamesNK - we've been running Web API over HTTP2 for just over a week now and now seen these same issues. So it's definitely a gRPC thing I think!

Hope you had a good holiday! Thanks for the help

JamesNK commented 11 months ago

Hi @JamesNK - we've been running Web API over HTTP2 for just over a week now and now seen these same issues. So it's definitely a gRPC thing I think!

I think there is a typo in your message. Do you have this problem with Web API or is it just gRPC?

jameswoodley commented 11 months ago

Hi @JamesNK - we've been running Web API over HTTP2 for just over a week now and now seen these same issues. So it's definitely a gRPC thing I think!

I think there is a typo in your message. Do you have this problem with Web API or is it just gRPC?

You're correct. Should have said noT seen these same issues. Web Api has been working fine

aadog commented 9 months ago

Encountering the same problem, GRPC cannot throw

somatech-uk commented 1 month ago

I'm encountering a similar issue, but what solves it is moving the app service to a new app service plan.

@jameswoodley did you ever come back to implementing gRPC ? I'm thinking our only option is to move back to REST calls

jameswoodley commented 1 month ago

@somatech-uk no we decided to go back to REST Web Api and be done with it. No issues since