Open randomPoison opened 4 years ago
One case study of how we're handling this is MatchState
which currently handles a lot of the work of making state changes atomic.
Pretty much all state changes are fallible, and each of them guarantees that if an error is returned no changes to the match's state will be observed. For some of them, this is done by checking all invariants first before making changes. For ones where validity can only be confirmed after making state changes, any changes are rolled back before returning an error.
In theory this should ensure that the state in MatchController
stays consistent, since MatchController
mostly delegates to MatchState
. However, the one bit of extra state management that that MatchController
performs is communicating with the client, which has the implicit state around broadcasting match updates to connected clients. This is the example shown in the original ticket: If the server changes the game state and then an error occurs before it broadcasts the changes to the client, we get stuck.
In setting up the server, I'm running into the question of how to do error handling in a way that ensures all state changes are atomic. When handling user input, we have to be careful not to partially-apply any changes before we reject an invalid input, lest we leave the server in an inconsistent state. This is easy to do by accident when using the
?
operator to propagate errors upward, as we'd like to do in most cases:The above is an example of a fairly common case that I've noticed: If we apply some changes to the server's local state and then early-return for error propagation before we broadcast the state changes to the clients, the game is left in an inconsistent state where the server thinks an action has occurred but none of the clients know about it.
The main issue here is that any changes to server state need to be atomic: They either happen completely if successful or don't happen at all if an error occurs. For any given case where this happens there's usually a fairly simple solution, such as manually checking for errors and rolling back changes if one occurs. The focus of this ticket is instead to try to track patterns around where these cases come up and determine what patterns we can employ to make it less likely that we'll accidentally write code with this kind of error.