zeroc-ice / ice

All-in-one solution for creating networked applications with RPC, pub/sub, server deployment, and more.
https://zeroc.com
GNU General Public License v2.0
2.05k stars 592 forks source link

CloseConnection and invocation cancellation / retry #3151

Open bernardnormier opened 1 week ago

bernardnormier commented 1 week ago

A connection sends a CloseConnection frame to its peer to initiate a graceful connection close (aka shutdown).

The CloseConnection frame is sent by initiateShutdown only when there is no outstanding upcall, for example:

https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1799 https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1767 https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1094

For upcallCount, see: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L2949

(it's the same in other languages)

upcallCount == 0 in particular guarantees there is no outstanding dispatch (since a dispatch is an upcall). This includes the dispatch of oneway requests.

Connection.close calls initiateShutdown only once there is no outstanding two-way invocations. See: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L183 https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L475

but other events (OA deactivation, Communicator destruction) don't check two-way invocations and will happily initiate shutdown (= send CloseConnection) with outstanding two-way invocations: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L148

Now, this graceful connection closure is usually fast and successful, but it can sometimes take a while and even fail.

The typical reason for taking a while is the peer is busy with its own upcalls:

In the current code, we wait until the connection closure is complete to cancel two-way invocations in the peer (= the side that didn't send CloseConnection) that are outstanding when the peer receives the CloseConnection frame: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1245

All these invocations are safe to cancel & retry since the CloseConnection frame guarantees none of these requests were dispatched (remember, the initiator waits until upcallCount is 0). https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/Internal/OutgoingAsync.cs#L644

This wait delays the retrying of the invocations with a new connection (to the same server or another replica). Worse, these retry-able invocations may not be retried at all if the connection closure is not graceful for some reason (typically because it times out).

We should fix this behavior by canceling these invocations sooner--as soon as we receive the CloseConnection frame.

bernardnormier commented 1 week ago

In icerpc-csharp's implementation of the ice protocol, we do cancel outstanding invocations immediately when we receive a CloseConnection frame:

https://github.com/icerpc/icerpc-csharp/blob/071848c40365c3fe204f4e1d7866c22e4300dc1a/src/IceRpc/Internal/IceProtocolConnection.cs#L1201

https://github.com/icerpc/icerpc-csharp/blob/071848c40365c3fe204f4e1d7866c22e4300dc1a/src/IceRpc/IceRpcError.cs#L27

You need the retry interceptor to retry automatically on IceRpcError.InvocationCanceled: https://docs.icerpc.dev/api/csharp/api/IceRpc.Retry.RetryInterceptor.html#IceRpc_Retry_RetryInterceptor_remarks

Without it, the caller gets the IceRpcError.InvocationCanceled error code and can then safely retry at the application level.