grpc / grpc-swift

The Swift language implementation of gRPC.
Apache License 2.0
2.01k stars 414 forks source link

One RPC mysteriously using a different connection ID than others #1420

Closed CooperWolfe closed 2 years ago

CooperWolfe commented 2 years ago

Describe the bug

One RPC, sharing the same ClientConnection as others, contains a different grpc_connection_id in its logs than the others.

To reproduce

When my application launches, I immediately see the following logs, with no RPC calls made:

2022-05-31T12:39:50-0400 debug logging : connectivity_state=idle grpc_connection_id=0ABFF424-2158-4657-8D7A-E0B0F4BC34D7/0 shutdown.mode=forceful shutting down connection
2022-05-31T12:39:50-0400 debug logging : grpc_connection_id=0ABFF424-2158-4657-8D7A-E0B0F4BC34D7/0 new_state=shutdown old_state=idle connectivity state change

I then make an RPC call (RPC call A), which is followed by this log:

2022-05-31T12:39:50-0400 debug logging : connectivity_state=shutdown grpc_connection_id=0ABFF424-2158-4657-8D7A-E0B0F4BC34D7/0 vending multiplexer future

At this point, the server still shows no signs of having been contacted.

It's worth mentioning that this RPC call (RPC call A) lives in a service alongside another service (with RPC call B) on the same server. I also have another service (with RPC call C) living on another server. All these services are contacted through a gateway and I am able to make all three RPC calls as expected via BloomRPC.

RPC calls B and C are both working as expected in Swift, with logs similar to the following:

2022-05-31T12:39:52-0400 debug logging: grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 new_state=connecting old_state=idle connectivity state change
2022-05-31T12:39:52-0400 debug logging : connectivity_state=connecting grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 vending multiplexer future
2022-05-31T12:39:52-0400 debug logging : grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 making client bootstrap with event loop group of type NIOTSEventLoop
2022-05-31T12:39:52-0400 debug logging : grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 Network.framework is available and the EventLoopGroup is compatible with NIOTS, creating a NIOTSConnectionBootstrap
2022-05-31T12:39:52-0400 debug logging : connectivity_state=connecting grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 activating connection
2022-05-31T12:39:53-0400 debug logging : grpc.conn.addr_local=<redacted> grpc.conn.addr_remote=<redacted> grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 h2_settings_enable_connect_protocol=0 h2_settings_header_table_size=4096 h2_settings_initial_window_size=268435456 h2_settings_max_concurrent_streams=2147483647 HTTP2 settings update
2022-05-31T12:39:53-0400 debug logging : connectivity_state=active grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 connection ready
2022-05-31T12:39:53-0400 debug logging : grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 new_state=ready old_state=connecting connectivity state change
2022-05-31T12:39:53-0400 debug logging : grpc.conn.addr_local=<redacted> grpc.conn.addr_remote=<redacted> grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 h2_active_streams=1 h2_stream_id=HTTP2StreamID(1) HTTP2 stream created
2022-05-31T12:39:53-0400 debug logging : grpc.conn.addr_local=<redacted> grpc.conn.addr_remote=<redacted> grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 h2_active_streams=0 h2_stream_id=HTTP2StreamID(1) HTTP2 stream closed
2022-05-31T12:49:52-0400 debug logging : connectivity_state=ready grpc_connection_id=199DCEC5-DFFB-4E9D-A1F0-EB339385BCF0/0 vending multiplexer future

What is particularly interesting is that BOTH calls have the same grpc_connection_id each time I make them. However, every time RPC call A is executed, it uses the grpc_connection_id that was mysteriously forcefully shutdown at the start of the application. All three calls are BidirectionalStreamingCall's. All three clients share the same ClientConnection instance and CallOptions. All three calls are made in the same DispatchQueue.

Expected behaviour

I would expect RPC call A to have the same grpc_connection_id as B and C. However, I would expect to not have a forceful shutdown at the start of the application and mostly just want to get the call working.

Additional information

N/A

glbrntt commented 2 years ago

When my application launches, I immediately see the following logs, with no RPC calls made:

2022-05-31T12:39:50-0400 debug logging : connectivity_state=idle grpc_connection_id=0ABFF424-2158-4657-8D7A-E0B0F4BC34D7/0 shutdown.mode=forceful shutting down connection
2022-05-31T12:39:50-0400 debug logging : grpc_connection_id=0ABFF424-2158-4657-8D7A-E0B0F4BC34D7/0 new_state=shutdown old_state=idle connectivity state change

It seems to me that the connection is in the process of closing (first log is "shutting down connection") when you attempt to call RPC A. Because it shuts down and is replaced by a new connection we have a different connection ID.

The "vending multiplexer future" log does not mean that the RPC has started, it means that's the connection is attempting to provide a stream of the RPC. This will always fail because the connection is shutting down.

CooperWolfe commented 2 years ago

Does the second log not indicate the completion of that state change? I ask because if I never send a request those are the only two logs I see. I can then send RPC A minutes later to no avail.

Also, I can successfully send B and C and then unsuccessfully send A. Once again in this scenario, A logs vending multiplexer future with the grpc_connection_id that shut down and B and C use the working one.

glbrntt commented 2 years ago

Yes, you're right the second log indicates that. The third log also tells us that the connection is shutdown (connectivity_state=shutdown) at that point. Shutdown is a terminal state, so B and C must be running on a connection provided by an entirely different object.

Is the connection being explicitly shutdown by your application?

CooperWolfe commented 2 years ago

That it apparently was. Turns out I was registering two of the class that maintained the client connection to a dependency container. The first one, which was used during the construction of client A, shut down the client connection during its deinit, which was triggered by the registration of the second one.

Thank you very much for your help @glbrntt