Open bernardnormier opened 1 week ago
In icerpc-csharp's implementation of the ice protocol, we do cancel outstanding invocations immediately when we receive a CloseConnection frame:
You need the retry interceptor to retry automatically on IceRpcError.InvocationCanceled: https://docs.icerpc.dev/api/csharp/api/IceRpc.Retry.RetryInterceptor.html#IceRpc_Retry_RetryInterceptor_remarks
Without it, the caller gets the IceRpcError.InvocationCanceled error code and can then safely retry at the application level.
A connection sends a CloseConnection frame to its peer to initiate a graceful connection close (aka shutdown).
The CloseConnection frame is sent by
initiateShutdown
only when there is no outstanding upcall, for example:https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1799 https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1767 https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1094
For upcallCount, see: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L2949
(it's the same in other languages)
upcallCount == 0 in particular guarantees there is no outstanding dispatch (since a dispatch is an upcall). This includes the dispatch of oneway requests.
Connection.close calls
initiateShutdown
only once there is no outstanding two-way invocations. See: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L183 https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L475but other events (OA deactivation, Communicator destruction) don't check two-way invocations and will happily initiate shutdown (= send CloseConnection) with outstanding two-way invocations: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L148
Now, this graceful connection closure is usually fast and successful, but it can sometimes take a while and even fail.
The typical reason for taking a while is the peer is busy with its own upcalls:
In the current code, we wait until the connection closure is complete to cancel two-way invocations in the peer (= the side that didn't send CloseConnection) that are outstanding when the peer receives the CloseConnection frame: https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/ConnectionI.cs#L1245
All these invocations are safe to cancel & retry since the CloseConnection frame guarantees none of these requests were dispatched (remember, the initiator waits until upcallCount is 0). https://github.com/zeroc-ice/ice/blob/fa228276c959094c6c885e297f52913f9a57324e/csharp/src/Ice/Internal/OutgoingAsync.cs#L644
This wait delays the retrying of the invocations with a new connection (to the same server or another replica). Worse, these retry-able invocations may not be retried at all if the connection closure is not graceful for some reason (typically because it times out).
We should fix this behavior by canceling these invocations sooner--as soon as we receive the CloseConnection frame.