Closed mrjerryjohns closed 2 years ago
As for the actual recovery, we should implement CASE session resumption, and always do that as the recovery step upon completion of operational node discovery. Resumption is a fairly low-cost operational that doesn't involve heavy crypto, and avoids the penalty of CASE session establishment. The alternatives here are unfortunately quite un-ideal:
I don't thing session resumption can fix any network related errors.
In case of device rebooted, if we implemented session persistent, the session resumption is able to reset the message counter. So we should implement session persistent first. W/o session persistent, the session resumption is not helpful at all.
Add an API called UpdateSession in SessionHolder that will atomically, swap out an old session with a new one. This will likely be called by CASESessionManager upon completion of session resumption and will point to the new CASE session instance. This will automatically update session holders to point to the new session transparently.
When a session is resumed, a new session with a new session key id is created, this means the existing exchanges will magically jump to the new session. In contract the spec says an exchange SHALL be bound to exactly one underlying session that will transport all associated Exchange messages for the life of that exchange.
Another solution is using a OnSessionResumed
event, handled by ReadClient
, so it can be bound to the new resumed session. Because the session resumption only works when a node is rebooted, we should invalid all existing exchanges.
I don't thing session resumption can fix any network related errors.
It doesn't, but we don't have any way to tell why we saw a timeout. It could be because of a reboot on the peer, a change in IP address, or because of a momentary network disturbance. There's no way to distinguish between these cases, so we have to take a conservative approach and re-do discovery + CASE session establishment/resumption.
So we should implement session persistent first. W/o session persistent, the session resumption is not helpful at all.
Sure. When I stated session resumption, I was implicitly denoting all dependent work items to make that feature possible. Sounds like there's more items needed here to make happen...
When a session is resumed, a new session with a new session key id is created, this means the existing exchanges will magically jump to the new session.
Is this because we have a SessionHolder
in the ExchangeContext? In that case, shouldn't the exchange be having a SessionHandle
instead?
Another solution is using a OnSessionResumed event,
We have one mechanism to update sessions in the SessionHolder
construct. Why would we make yet another construct do this?
Because the session resumption only works when a node is rebooted
So, if a node has not rebooted and still has a valid prior session, what's the behavior of a CASE session resumption message is sent to it?
Based on discussions summarized here, we'll pivot to doing the following for the IM:
Fix-up the IM re-subscription logic to permit both applications managing the session setup on each ensuing re-subscription attempt, as well as the IM possibly handling that on its own by talking directly to CASESessionManager. IM re-subscription logic should be tweaked such that the expectations of provided a policy callback should be updated to now be responsible for arming the timer as well as re-establishing the session. Default implementation can arm the timer + talk to CASESessionManager to establish new session. Applications can over-ride that and do this themselves, and call directly into the IM to re-subscribe with a given session handle.
In steady state, when a given protocol interaction times out (e.g subscription failure), it's not clear exactly who/what handles recovery from such an occurrence, and exactly what that that recovery should entail.
Some key questions surface:
Most of the questions here are from the perspective of the controller/client.
In discussions with @bzbarsky-apple, here's what we settled on as one possible strategy:
SessionRecoveryDelegate
as the logical home for this).CASESessionManager
to appropriate manage subsequent recovery logic.OnFirstMessageDeliveryFailed
. The results of that should be used to update the target address for a given session.UpdateSession
inSessionHolder
that will atomically, swap out an old session with a new one. This will likely be called byCASESessionManager
upon completion of session resumption and will point to the new CASE session instance. This will automatically update session holders to point to the new session transparently.One of the key benefactors of 5) above will be the re-subscription logic in
ReadClient
that will continue to work correctly across sessions as they get re-established/resumed.