grpc-ecosystem / grpc-spring

Spring Boot starter module for gRPC framework.
https://grpc-ecosystem.github.io/grpc-spring/
Apache License 2.0
3.48k stars 812 forks source link

High latency on first grpc call #931

Closed 313hemant313 closed 1 year ago

313hemant313 commented 1 year ago

After deployment of a service (Which internally calls multiple grpc services and methods), all first calls from this service to multiple grpc service is giving high latency.

Even though latency is low once the call has been landed on any of the subsequent service and work done.

Example:

ServiceOrchestrator

ServiceX

ServiceY

So maybe channel creation is taking some time. (We are using certificate security using trustCertCollection config).

Is the above assumption correct, do we have a way to log grpc channel creation time ?

ST-DDT commented 1 year ago

It is not the channel bean that takes so long to create, it is probably the DNS/address lookup+TCP connection that is so slow. I experienced the same as well, usually not worse than the first HTTP request.

There are two "workarounds" for that. 1) Enable the connect on startup feature.

https://github.com/yidongnan/grpc-spring-boot-starter/blob/162b7179f5f6ce03682ed73fa76a8f9ef753bf08/grpc-client-spring-boot-autoconfigure/src/main/java/net/devh/boot/grpc/client/config/GrpcChannelProperties.java#L405

2) Create a new bean/config that has all @GrpcClient targets (names) and use each Channel to fire a Health check or other request against it (the server doesn't have to actually implement the request and you don't have to wait for the response, just trigger the connection)

If your application has lots of idle time you might want to call that request periodically or enable keep alive.

If this isn't satisfactory to you, you can open an issue over at grpc-java or on stackoverflow, they might know more on how to debug the wait time. Please leave a link here if you do so, so other people (me) can also learn from this and maybe add a new feature for it.

Does this help you?

313hemant313 commented 1 year ago

Thanks, will try the getImmediateConnectTimeout approach first. what should be the duration value here ? should i try 60 sec ?

ST-DDT commented 1 year ago

The service won't be reported as up for up to that duration per connection so maybe 15s? The initial connection usually shouldn't take that long.

313hemant313 commented 1 year ago

Okay, will try and report back.

313hemant313 commented 1 year ago

The service won't be reported as up for up to that duration per connection so maybe 15s? The initial connection usually shouldn't take that long.

Is there any health check mechanism to check if services are up ?

I checked https://github.com/yidongnan/grpc-spring-boot-starter/issues/461 and https://yidongnan.github.io/grpc-spring-boot-starter/en/actuator.html, but wanted to check like is there any special handling for above scenario.

ST-DDT commented 1 year ago

Are you referrimg to something like this:

https://github.com/grpc/grpc-java/blob/master/services/src/generated/main/grpc/io/grpc/health/v1/HealthGrpc.java

313hemant313 commented 1 year ago

The service won't be reported as up for up to that duration per connection so maybe 15s? The initial connection usually shouldn't take that long.

Yes correct, so after 15s using check api should give SERVING status right ?

https://github.com/grpc/grpc-java/blob/master/services/src/generated/main/grpc/io/grpc/health/v1/HealthGrpc.java

enum ServingStatus {
UNKNOWN = 0;
SERVING = 1;
NOT_SERVING = 2;
SERVICE_UNKNOWN = 3;  // Used only by the Watch method.
}
ST-DDT commented 1 year ago

Well you are mixing the two solutions here. The immediate connect timeout causes the clients to establish the connection immediately without request. The client wont continue starting prior to that. The HealthGrpc request is an actual request that you can send to force the connection to be created or just to check the other services health, if it happens to implement/provide that api. If the server is up and running you will get SERVING as a response immediately (after the initial connection delay).

313hemant313 commented 1 year ago

Ohh okay so ImmediateConnectTimeout is a client property.

In the below shared example (ServiceOrchestrator, ServiceX and ServiceY), should i add ImmediateConnectTimeout to 15sec in ServiceOrchestrator service ?

In case of deployment of ServiceOrchestrator we should route the traffic to new release only after 15sec right ? to avoid latency of first call ? and we should somehow get NOT_SERVING from heath check ?

ST-DDT commented 1 year ago

This depends on your requirements and setup.

If you have only a single instance of your service, there might not be much difference between calling it to early/running in a timeout and having the service not ready. In both cases the request fails. If you have multiple instances of said service, then waiting for spring to report ready might be a good idea. Spring actuator provides a health/up endpoint that you can use for that. (See also readiness and liveness probes).

https://spring.io/blog/2020/03/25/liveness-and-readiness-probes-with-spring-boot

I don't know whether the server implements the health service, this library does it via this auto-config if the dependencies are present: https://github.com/yidongnan/grpc-spring-boot-starter/blob/master/grpc-server-spring-boot-autoconfigure/src/main/java/net/devh/boot/grpc/server/autoconfigure/GrpcHealthServiceAutoConfiguration.java#L56

313hemant313 commented 1 year ago

Assuming blue green deployment (Old services are running and we will start routing the traffic once the new service is in ready state).

ServiceOrchestrator we should route the traffic to new release only after 15sec right ? to avoid latency of first call ? and we should somehow get NOT_SERVING from heath check ?

ST-DDT commented 1 year ago

ServiceOrchestrator we should route the traffic to new release only after 15sec right ?

When the service is up and running. This may or may not be 15s.

313hemant313 commented 1 year ago

Okay got it. So when all the stubs are ready before 15sec the app will be in SERVING state.

ST-DDT commented 1 year ago

15s per distinct target, but yes.

313hemant313 commented 1 year ago

With ImmediateConnectTimeout app starts fails if any of the grpc client connection fails, as @GrpcClient is a mandatory bean.

Any workaround for this ? like Autowired(required = false) ?

ST-DDT commented 1 year ago

Configure that property only for mandatory clients or make that first connection in a different way.

313hemant313 commented 1 year ago

Even after using ImmediateConnectTimeout, facing same issue.

ServiceOrchestrator to ServiceX latency is 2 sec, but in ServiceX latency recorded is 300ms.

ST-DDT commented 1 year ago

Please try to analyse the network using wireshark or a similar tool. If that doesnt help you please ask upstream (grpc-java) or on Stackoverflow for help as currently have no other ideas what issues you might have. If you open a question elsewhere please post a link here so I/others can also learn new suggestions and solutions. Maybe they can be added as a feature in the future.

313hemant313 commented 1 year ago

Seems to be some warmup issue.

https://github.com/grpc/grpc-java/issues/1758