We're using logux at https://voiceflow.com/ to enable a consistent multiplayer experience for designing conversations.
The frontend is react-redux, with the logux store.
Our persistence layer is MongoDB and we update the document from the server based on the action through mongo's query language. This may update deeply nested fields within an object.
Race Condition
I noticed a bug from my repeated testing of starting and stopping the server to see how logux syncs offline actions.
From a high level this is what is happening:
The user is offline/disconnected, they perform a large number of actions
Since there are many offline actions to sync, syncSince will call sendSync, which in turn sends out the actions, and sets the node's state: this.setState('sending')
changing the node's state to sending causes this block of client to trigger. (it was previously disconnected, so resubscribe to the channels). This effectively sends out the logux/subscribe action right as all the sync messages from syncSince are being sent.
on the server, it processes all the messages with the syncMessage function. This executes them as a sequence of promises with options.inMap potentially delaying execution before the action is added to the log and processed.
This is all fairly stochastic. TCP should ensure the websocket messages are sent in order, but the server itself might only partially process the syncMessage before processing the resubscribe (logux/subscribe)
This means it is possible for the channel's load function, which should send back the latest state with all the changes, to send back a state without the offline sync messages, or a partial amount.
This is especially problematic for us because on the channel's load, we are fetching the state from MongoDB while actions are still being processed and read into the DB in the background.
Solution
This PR ensures that we only resubscribe to a channel AFTER everything is synchronized, instead of at the same time the syncMessage is being processed.
This might lead to the slightest delay between reconnecting and reloading the state from the channel's load - but it's worth it to keep data parity.
Context
We're using logux at https://voiceflow.com/ to enable a consistent multiplayer experience for designing conversations. The frontend is react-redux, with the logux store. Our persistence layer is MongoDB and we update the document from the server based on the action through mongo's query language. This may update deeply nested fields within an object.
Race Condition
I noticed a bug from my repeated testing of starting and stopping the server to see how logux syncs offline actions.
From a high level this is what is happening:
connectedMessage
which in turn callssyncSince
.syncSince
will callsendSync
, which in turn sends out the actions, and sets the node's state:this.setState('sending')
sending
causes this block ofclient
to trigger. (it was previously disconnected, so resubscribe to the channels). This effectively sends out thelogux/subscribe
action right as all the sync messages fromsyncSince
are being sent.syncMessage
function. This executes them as a sequence of promises withoptions.inMap
potentially delaying execution before the action is added to the log and processed.This is all fairly stochastic. TCP should ensure the websocket messages are sent in order, but the server itself might only partially process the
syncMessage
before processing the resubscribe (logux/subscribe
)This means it is possible for the channel's
load
function, which should send back the latest state with all the changes, to send back a state without the offline sync messages, or a partial amount. This is especially problematic for us because on the channel'sload
, we are fetching the state from MongoDB while actions are still being processed and read into the DB in the background.Solution
This PR ensures that we only resubscribe to a channel AFTER everything is synchronized, instead of at the same time the
syncMessage
is being processed. This might lead to the slightest delay between reconnecting and reloading the state from the channel'sload
- but it's worth it to keep data parity.