grpc / grpc-swift

The Swift language implementation of gRPC.
Apache License 2.0
2.05k stars 420 forks source link

Connection hangs and RPCs fail with "error 14 The operation couldn’t be completed" #1592

Open IrisG33 opened 1 year ago

IrisG33 commented 1 year ago

Describe the bug

The RPCs start to fail when connectivity changes, e.g. wifi -> cellular, or wifi -> disabled, on/off airplane mode. The later RPCs all fails, only after restarting the app (recreate a GRPCConnection), the RPC can succeed.

Here is our configuration:

let keepalive = ClientConnectionKeepalive(interval: .seconds(15), timeout: .seconds(10))
let eventLoopGroup = MultiThreadedEventLoopGroup(numberOfThreads: 1)
let channel = GRPCChannelPool.with(
      target: .hostAndPort(hostname, port),
      transportSecurity: .tls(GRPCTLSConfiguration.makeClientConfigurationBackedByNIOSSL()),
      eventLoopGroup: eventLoopGroup
    ) { configuration in
      configuration.keepalive = keepalive
    }

Some GRPC tracing log:

A request starts:

12:28:25 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] HTTP2 stream created function:log(level:message:metadata:source:file:function:line:)[130:12]

Some connectivity changes...

12:28:43 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] deactivating connection function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:43 PM VERBOSE <V> [1681845175315102] Logger:[RPC TRACE] scheduling connection attempt function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:43 PM VERBOSE <V> [1681845175315102] Logger:[RPC TRACE] making client bootstrap with event loop group of type SelectableEventLoop function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:43 PM VERBOSE <V> [1681845175315102] Logger:[RPC TRACE] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:43 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] creating a ClientBootstrap function:log(level:message:metadata:source:file:function:line:)[130:12]

Then few pending requests fail:

12:28:43 PM ERROR   <E> [1681845175315102]  14 The operation couldn’t be completed.
12:28:43 PM ERROR   <E> [1681845175315102]  14 The operation couldn’t be completed.
...

Each new request enqueued continues to show the following GRPC trace logs:

12:28:45 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] enqueued connection waiter function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:46 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] making client bootstrap with event loop group of type SelectableEventLoop function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:46 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] Network.framework is available but the EventLoopGroup is not compatible with NIOTS, falling back to ClientBootstrap function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:46 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] creating a ClientBootstrap function:log(level:message:metadata:source:file:function:line:)[130:12]
12:28:46 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] scheduling connection attempt function:log(level:message:metadata:source:file:function:line:)[130:12]

After connectivity recovers, until the app was forcibly evicted, no RPCs succeeded. The only trace logging we receive from GRPC is is an enqueued connection:

12:31:42 PM VERBOSE <V> [1681845175315102] Logger: [RPC TRACE] enqueued connection waiter function:log(level:message:metadata:source:file:function:line:)[130:12]

grpc-swift version: 1.13.2 Release build Platform: iOS

To reproduce

This is not 100% reproducible, but quite often reproduced under the connectivity change pattern.

Expected behaviour

The connection can recover when the connectivity is backup.

Please advice is there a certain configuration we can set up on gRPC channel. Or is there a recommended way to restart/manage the GRPC Connection.

glbrntt commented 1 year ago

Could you provide complete logs?

How long are you waiting between connectivity recovering and evicting the app? If a connect fails then the backoff period between attempts will grow exponentially (up to a limit) -- it might just be that you're not waiting long enough for the next connect attempt.

IrisG33 commented 1 year ago

Could you provide complete logs?

Unfortunately this are all the log I collected so far. It seems the RPC level logging is missing? I set logger for both ClientConnection.Builder.withBackgroundActivityLogger and CallOptions.

let callOptions = CallOptions(logger: self.debugTraceLogger)

GRPCChannelPool.with(
   target: .hostAndPort(hostname, portNum),
   transportSecurity: .tls(GRPCTLSConfiguration.makeClientConfigurationBackedByNIOSSL()),
   eventLoopGroup: self.eventLoopGroup
) { configuration in
   configuration.backgroundActivityLogger = self.debugTraceLogger
}

Before evicting the app, we cancel the RPC if it doesn't respond back within 20s, and retry the RPC. Considering the default minimumConnectionTimeout is also 20s. Do you suggest to do wait a little longer (like 25s) before the retrying?

Shouldn't gRPC try to reconnect after the backoff period, since we have enqueued bunch of retried RPCs?

Instead of backoff exponentially, is there a way to fail early, and we can get notified to recreate a new connection?

IrisG33 commented 1 year ago

In terms of the gRPC connection behavior during the backoff period, I found this documentation.

During the backoff period, any call to the same host will wait until the first of the following events occur:
* Connection succeeded; calls will be made using this channel;
* Connection failed; calls will be failed and return UNAVAILABLE status code;
* The call's deadline is reached; the call will fail and return DEADLINE_EXCEEDED status code. The length of backoff period of a channel is reset whenever a connection attempt is successful.

I wonder does gRPC Swift follow this behavior?

Also if an ongoing RPC is cancelled by the caller, does it affects the backoff behavior, such as resetting the backoff timer etc?

IrisG33 commented 1 year ago

Bump this thread.

We are still encountering failures mostly under network transition condition or poor network condition (reproduced under 3g network).

[RPC TRACE] shutting down connection, no reconnect configured/remaining function:log(level:message:metadata:source:file:function:line:)[143:12]

However the failure doesn't occur for RPCs using the legacy CgRPC - Swift gRPC.

Any ideas? Thanks in advance.

glbrntt commented 1 year ago

Could you provide complete logs?

Unfortunately this are all the log I collected so far. It seems the RPC level logging is missing? I set logger for both ClientConnection.Builder.withBackgroundActivityLogger and CallOptions.

There should be more logs than this... we should at least see the connection state changes (i.e. idle to connection and so on). How are you creating your logger?

Shouldn't gRPC try to reconnect after the backoff period, since we have enqueued bunch of retried RPCs?

gRPC will reconnect after the backoff period if there are RPCs queued up.

Instead of backoff exponentially, is there a way to fail early, and we can get notified to recreate a new connection?

You can configure the backoff settings used, so you could configure it to use no backoff if you wanted to recreate a connection immediately.

In terms of the gRPC connection behavior during the backoff period, I found this documentation.

During the backoff period, any call to the same host will wait until the first of the following events occur:
* Connection succeeded; calls will be made using this channel;
* Connection failed; calls will be failed and return UNAVAILABLE status code;
* The call's deadline is reached; the call will fail and return DEADLINE_EXCEEDED status code. The length of backoff period of a channel is reset whenever a connection attempt is successful.

I wonder does gRPC Swift follow this behavior?

Not quite, if the connection fails the RPC will not be failed. It will stay queued until the deadline is reached (or the waiting deadline is reached on the connection pool).

Also if an ongoing RPC is cancelled by the caller, does it affects the backoff behavior, such as resetting the backoff timer etc?

No, the backoff timer will not be affected.


I think we need to see the full logs here to understand what's going on with your connection.

IrisG33 commented 1 year ago

There should be more logs than this... we should at least see the connection state changes (i.e. idle to connection and so on). How are you creating your logger?

Ah, after using .debug logLevel, I got more logs: I have set backoff to fixed 5s, and retry upTp(1); keepalive(interval: .seconds(15), timeout: .seconds(10)).

2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : old_state=idle new_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity state change
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity_state=connecting vending multiplexer future
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 making client bootstrap with event loop group of type NIOTSEventLoop
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity_state=connecting activating connection
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 TLS handshake completed, negotiated protocol: grpc-exp
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc.conn.addr_remote=142.251.2.81 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_local=192.168.86.26 alpn=grpc-exp tls_version=tlsv13 TLS handshake completed
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 h2_settings_initial_window_size=1048576 h2_settings_max_concurrent_streams=100 h2_settings_max_header_list_size=65536 grpc.conn.addr_remote=142.251.2.81 grpc.conn.addr_local=192.168.86.26 HTTP2 settings update
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : connectivity_state=active grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connection ready
2023-06-06T13:38:41-0700 debug [grpc.swift.nio] : new_state=ready old_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity state change
2023-06-06T13:38:42-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=51DFE2C5-63E3-4D1C-A266-216695A29854 grpc.conn.addr_remote=142.251.2.81 grpc.conn.addr_local=192.168.86.26 activated stream channel
2023-06-06T13:38:42-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_remote=142.251.2.81 grpc.conn.addr_local=192.168.86.26 h2_stream_id=HTTP2StreamID(1) h2_active_streams=1 HTTP2 stream created
2023-06-06T13:38:44-0700 debug [grpc.swift.nio] : grpc.conn.addr_local=192.168.86.26 grpc.conn.addr_remote=142.251.2.81 h2_stream_id=HTTP2StreamID(1) h2_active_streams=0 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 HTTP2 stream closed
2023-06-06T13:38:59-0700 debug [grpc.swift.nio] : connectivity_state=ready grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 vending multiplexer future
2023-06-06T13:38:59-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=2A4D54A9-EAF9-45A8-BE94-7B3381F7ED3C grpc.conn.addr_remote=142.251.2.81 grpc.conn.addr_local=192.168.86.26 activated stream channel
2023-06-06T13:38:59-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_local=192.168.86.26 grpc.conn.addr_remote=142.251.2.81 h2_active_streams=1 h2_stream_id=HTTP2StreamID(3) HTTP2 stream created
2023-06-06T13:39:01-0700 debug [grpc.swift.nio] : connectivity_state=ready grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 vending multiplexer future
2023-06-06T13:39:01-0700 debug [grpc.swift.nio] : grpc_request_id=21CD6B32-5A02-4E7D-8B1B-4048B4FE66CD grpc.conn.addr_local=192.168.86.26 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_remote=142.251.2.81 activated stream channel
2023-06-06T13:39:01-0700 debug [grpc.swift.nio] : grpc.conn.addr_remote=142.251.2.81 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_local=192.168.86.26 h2_stream_id=HTTP2StreamID(5) h2_active_streams=2 HTTP2 stream created
2023-06-06T13:39:03-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity_state=ready vending multiplexer future
2023-06-06T13:39:03-0700 debug [grpc.swift.nio] : grpc.conn.addr_local=192.168.86.26 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=DF61C57C-7AC3-4617-98A6-C8976B72344A grpc.conn.addr_remote=142.251.2.81 activated stream channel
2023-06-06T13:39:03-0700 debug [grpc.swift.nio] : grpc.conn.addr_remote=142.251.2.81 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_local=192.168.86.26 h2_stream_id=HTTP2StreamID(7) h2_active_streams=3 HTTP2 stream created
2023-06-06T13:39:05-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity_state=ready vending multiplexer future
2023-06-06T13:39:05-0700 debug [grpc.swift.nio] : grpc.conn.addr_remote=142.251.2.81 grpc.conn.addr_local=192.168.86.26 grpc_request_id=E37D4689-8793-49E9-89C7-A6EBDFA4865D grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 activated stream channel
2023-06-06T13:39:05-0700 debug [grpc.swift.nio] : grpc.conn.addr_remote=142.251.2.81 h2_active_streams=4 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_local=192.168.86.26 h2_stream_id=HTTP2StreamID(9) HTTP2 stream created
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity_state=ready vending multiplexer future
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : grpc.conn.addr_local=192.168.86.26 grpc.conn.addr_remote=142.251.2.81 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=CC331A7A-B586-42DA-B030-A948050314C2 activated stream channel
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity_state=ready deactivating connection
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 delay=0 scheduling connection attempt
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : old_state=ready new_state=transientFailure grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 connectivity state change
2023-06-06T13:39:07-0700 error [grpc.swift.nio] : error=unavailable (14): Transport became inactive grpc_request_id=21CD6B32-5A02-4E7D-8B1B-4048B4FE66CD grpc.conn.addr_local=192.168.86.26 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc.conn.addr_remote=142.251.2.81 grpc client error
2023-06-06T13:39:07-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=2A4D54A9-EAF9-45A8-BE94-7B3381F7ED3C grpc.conn.addr_remote=142.251.2.81 error=unavailable (14): Transport became inactive grpc.conn.addr_local=192.168.86.26 grpc client error
2023-06-06T13:39:07-0700 error [grpc.swift.nio] : grpc.conn.addr_local=192.168.86.26 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=DF61C57C-7AC3-4617-98A6-C8976B72344A error=unavailable (14): Transport became inactive grpc.conn.addr_remote=142.251.2.81 grpc client error
2023-06-06T13:39:07-0700 error [grpc.swift.nio] : grpc.conn.addr_local=192.168.86.26 grpc.conn.addr_remote=142.251.2.81 error=unavailable (14): Transport became inactive grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc_request_id=CC331A7A-B586-42DA-B030-A948050314C2 grpc client error
2023-06-06T13:39:07-0700 error [grpc.swift.nio] : error=unavailable (14): Transport became inactive grpc.conn.addr_remote=142.251.2.81 grpc.conn.addr_local=192.168.86.26 grpc_request_id=E37D4689-8793-49E9-89C7-A6EBDFA4865D grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/0 grpc client error
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : new_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 old_state=transientFailure connectivity state change
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 making client bootstrap with event loop group of type NIOTSEventLoop
2023-06-06T13:39:07-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
2023-06-06T13:39:09-0700 debug [grpc.swift.nio] : connectivity_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 vending multiplexer future
2023-06-06T13:39:11-0700 debug [grpc.swift.nio] : connectivity_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 vending multiplexer future
2023-06-06T13:39:13-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 connectivity_state=connecting vending multiplexer future
2023-06-06T13:39:15-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 connectivity_state=connecting vending multiplexer future
2023-06-06T13:39:17-0700 debug [grpc.swift.nio] : connectivity_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 vending multiplexer future
2023-06-06 13:39:19.536423-0700 DemoApp[45379:21322745] [SystemGestureGate] <0x11c505890> Gesture: System gesture gate timed out.
2023-06-06T13:39:27-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 delay=5.0 scheduling connection attempt
2023-06-06T13:39:27-0700 debug [grpc.swift.nio] : old_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 new_state=transientFailure connectivity state change
2023-06-06T13:39:32-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 old_state=transientFailure new_state=connecting connectivity state change
2023-06-06T13:39:32-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 making client bootstrap with event loop group of type NIOTSEventLoop
2023-06-06T13:39:32-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
2023-06-06T13:39:39-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 error=RPC timed out before completing grpc_request_id=A7B81801-0159-40F3-9E58-7DE9067C3763 grpc client error
2023-06-06T13:39:41-0700 error [grpc.swift.nio] : error=RPC timed out before completing grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 grpc_request_id=38DC3E99-315C-4668-8507-5C6DA97ADCD5 grpc client error
2023-06-06T13:39:42-0700 debug [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 shutting down connection, no reconnect configured/remaining
2023-06-06T13:39:42-0700 debug [grpc.swift.nio] : new_state=shutdown old_state=connecting grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 connectivity state change
2023-06-06T13:39:42-0700 error [grpc.swift.nio] : grpc_request_id=8A48923E-A271-4C55-A38B-1BDC4D028669 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc client error
2023-06-06T13:39:42-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 grpc_request_id=FCD2EB9A-52AA-4DC1-B48A-98E4C0B7784B error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc client error
2023-06-06T13:39:42-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/1 grpc_request_id=5D325A41-5A84-4780-9943-6BEACAB22D2A error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc client error
2023-06-06T13:40:03-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:03-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_request_id=01BD1EED-F3C0-49E8-BC62-D91D1566C966 grpc client error
2023-06-06T13:40:05-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:05-0700 error [grpc.swift.nio] : error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc_request_id=DD3EDBD6-0A2B-42E9-B82D-1AA21F0A9256 grpc client error
2023-06-06T13:40:07-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:07-0700 error [grpc.swift.nio] : error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_request_id=A1D850B7-39C3-428E-9C73-F9F387E659F5 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc client error
2023-06-06T13:40:09-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:09-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc_request_id=C7FEABDF-4892-4CA3-86D8-E01FC0BE74C5 error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc client error
2023-06-06T13:40:11-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:11-0700 error [grpc.swift.nio] : error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc_request_id=42AFD664-34F6-43B9-ADBB-D2B01D251DCD grpc client error
2023-06-06T13:40:13-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:13-0700 error [grpc.swift.nio] : error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_request_id=EAE73BA6-7B4D-428E-A3A7-8753B24B696B grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc client error
2023-06-06T13:40:15-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:15-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_request_id=CAF7D5FA-FB7B-4D04-8385-6D1992DD4D39 grpc client error
2023-06-06T13:40:17-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:17-0700 error [grpc.swift.nio] : grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc_request_id=EFFD997B-055C-4B6A-9B9E-9341435663E3 error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc client error
2023-06-06T13:40:19-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:19-0700 error [grpc.swift.nio] : error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_request_id=C4C64CB1-D3AC-4D25-B9D4-87174CAF6C49 grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc client error
2023-06-06T13:40:21-0700 debug [grpc.swift.nio] : connectivity_state=shutdown grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 vending multiplexer future
2023-06-06T13:40:21-0700 error [grpc.swift.nio] : error=connectTimeout(NIOCore.TimeAmount(nanoseconds: 10000000000)) grpc_connection_id=5F9A8F2D-0FA5-48C5-8C25-1E26B41E83D1/2 grpc_request_id=0236B03A-EABA-4031-99A7-531C63DB19E9 grpc client error

My understanding is under poor network condition, the connection gets into unavailable (14) status, and gRPC will attempt the reconnection with the configured backoff setup. If all the retry fails, the connection is shutdown. Later on, even the network backs up, a new RPC will fail. Is this right?

If so, since I prefer fail fast, and thus I'll use short backoff, or no backoff, and use the ConnectivityStateDelegate to reboot the connection when the connection is shutdown. Any advices?

Thank you so much!

glbrntt commented 1 year ago

My understanding is under poor network condition, the connection gets into unavailable (14) status, and gRPC will attempt the reconnection with the configured backoff setup. If all the retry fails, the connection is shutdown. Later on, even the network backs up, a new RPC will fail. Is this right?

That's correct: the shutdown state is terminal, once it's shutdown there's no recovering and you'll need a new connection. Setting a retry limit is not really all that helpful with the connection pool since it manages connections for you. Instead, if you want to fail quickly you should consider setting a short maxWaitTime as this will fail the RPC if no connection is available after that time (this is separate from the RPC deadline).

IrisG33 commented 1 year ago

I'm using ClientConnection rather than the GRPChannelPool, because I'm using ConnectivityStateDelegate for restart the connection when closed. But ClientConnection seems doesn't have maxWaitTime.

Is there a equal configuration for ClientConnection, or alternatively is there a ConnectivityStateDelegate for GRPCChannelPool?

glbrntt commented 1 year ago

Oh, in that case you should set the callStartBehavior to fastFailure: if you try to start an RPC and the connection is backing off then it will fail immediately. This means you can just use a 'normal' backoff without limiting the number of retries.