microsoft / FluidFramework

Library for building distributed, real-time collaborative web applications
https://fluidframework.com
MIT License
4.72k stars 532 forks source link

PropertyDDS Rebase Issue #9130

Closed dstanesc closed 2 years ago

dstanesc commented 2 years ago

Describe the bug

We document below the behavior (^0.57.0) that a PropertyDDS client (C1) is loosing pending (uncommitted) changes in case of competing clients (C2) modifying in parallel the same state. Notable is also the inconsistency in behavior - while first C2 commit was not reflected by C1 bound/displayed state, second C2 change & commit modifies the C1 state.

The fundamental question to answer is what should be the expected behavior when one of the clients (C1) has pending local changes whereas another client (C2), starting the transaction/updates after C1 is committing earlier than C1.

We believe that the correct client transactional model should reflect the snapshot isolation policy. That is queries are seeing a consistent and immutable snapshot of the local data (captured when the transaction has been started). Coupled with a traditional LWW conflict resolution policy we believe that the last transaction (client) to commit the changes should win the race.

To Reproduce

A reproducer is created in a forked repository in the examples section of the experimental PropertyDDS.

Steps to reproduce the behavior:

Clone Forked Repository

git clone git@github.com:dstanesc/FluidFramework.git FluidFramework.Debug

Build Forked Repository

cd FluidFramework.Debug
git switch pdds-uncommitted-behavior
npm run clean
npm install
npm run build:fast
alias fb='clear && node "$(git rev-parse --show-toplevel)/node_modules/.bin/fluid-build"'
fb --all @fluid-experimental/prop0

Usage

Terminal 1

npx tinylicious

Terminal 2

cd FluidFramework.Debug/experimental/PropertyDDS/examples/prop0
npm start

The default browser will open to http://localhost:8080/. Click on the red number to update the value. Click on the green commit button to commit the value.

Current Behavior

  1. Client C1 modifies the dice value. No commit.
  2. Client C2 modifies the dice value and commits. The commit is reflected locally (C2) while the C1 state remains unchanged.
  3. Client C1 commits. C1 local state reflects intended C1 change. C2 local state remains unchanged. Opening a third client (C3) would prove that the actual state of DDS is the one of C2 while C1 seems out of synch.
  4. Client C2 modifies the dice value once again and commits. This time C1 state is updated with the most recent state from C2, also meaning C1 pending changes simply vanish. Notable is also the inconsistency in behavior - while first C2 commit was not reflected by C1 state update, second C2 commit actually modifies current C1 state.

A video documenting the current behavior is provided

Expected behavior

Following above reasoning (Describe the bug section), the expected behavior is:

  1. Client C1 modifies the dice value. No commit.
  2. Client C2 modifies the dice value and commits. The commit is reflected locally (C2) while the C1 state remains unchanged.
  3. Client C1 commits. C1 local state reflects intended C1 change. C2 local state is updated to C1 state.

Debugging Insights

The investigation points to FluidFramework/experimental/PropertyDDS/packages/property-dds/src/propertyTree.ts. This seems to be a rebase problem. The method _applyRemoteChangeSet#565 is the last one invoked when handling the remote changes. Reads the pending changes and reconciles them with the remote changes: local changes win. After the rebase, the pending changes are cleared up. The commit consequently cannot see local changes anymore, they are applied only locally.

Relevant code sections:

commit#196

const changes = this._root._serialize(true, false, BaseProperty.MODIFIED_STATE_FLAGS.PENDING_CHANGE);

_applyRemoteChangeSet#565

const pendingChanges = this._root._serialize(true, false, BaseProperty.MODIFIED_STATE_FLAGS.PENDING_CHANGE);
...
const changesNeeded = this.rebaseLocalChanges(change, pendingChanges, changesToTip);
...
this._root.cleanDirty(BaseProperty.MODIFIED_STATE_FLAGS.PENDING_CHANGE);
dstanesc commented 2 years ago

Note

The current findings build upon longer lived edit sessions and manual commits. It is our belief however that same behavior will manifest, this time as more difficult to detect race conditions, even with shorter edit sessions and automated commits.

dstanesc commented 2 years ago

Note

Using older versions, such 0.49.1 the behavior appears to be more severe.

If there are uncommitted changes on one side (ie. C1) the other clients (ie C2) fail with errors on state update attempts:

this._localPrimitivePropertiesAndTemplates.item(...).getPropertyTemplate is not a function
at PropertyFactory../node_modules/@fluid-experimental/property-properties/lib/propertyFactory.js.PropertyFactory.getTemplate (main.746cb6e0cb94cd6105d7.js:45393)
at Object.getLocalOrRemoteSchema (main.746cb6e0cb94cd6105d7.js:7686)

which corrects itself after issuing the commit on pending C1 changes.

Deserves also noted that the apparently more severe behavior actually protects data integrity better as other clients do not get the chance to corrupt the pending changes form C1

milanro commented 2 years ago

Further debug insight.

As mentioned above, in the method :

FluidFramework/experimental/PropertyDDS/packages/property-dds/src/propertyTree.ts
_applyRemoteChangeSet#565

there is the cleanup of pending changes by calling : this._root.cleanDirty(BaseProperty.MODIFIED_STATE_FLAGS.PENDING_CHANGE);

However, after this call there is reapply of the previous pending changes by this._root.applyChangeSet(pendingChanges);

The execution leads to experimental/PropertyDDS/packages/property-properties/src/properties/stringProperty.js _setValue#361 (or to any other primitive property, in the dice case, it might be some number property)

This method, adds the pending operation. However it checks, whether value changed. var oldValue = this._dataArrayRef; var castedValue = String(in_value); var changed = castedValue !== oldValue; if (changed) {...

The pending changes were previously applied in the _applyRemoteChangeSet#565 method at the call this._root.applyChangeSet(changesToTip); so they do not change.

That is why they also do not repopulate the _pendingChangeswithin the _root tree and the commit will not find them.

The further exploration could lead to an option to reapply value even if it did not change (possibly enabled by flag passed through the whole stack)

Something like : if (isApplyUnchanged || changed)

For the temporary testing, we can use if (true)

AlexBerner commented 2 years ago

Thanks for the detailed description, we're having a look!

ruiterr commented 2 years ago

This sounds like a bug.

The expected behaviour you describe is correct. The C2 state should be updated after C1 has been commited.

Thanks for posting this and preparing the detailed repo case. We will investigate the bug.

milanro commented 2 years ago

Hello @ruiterr

We have an experimental fix, the changes can be found in https://github.com/dstanesc/FluidFramework/commit/d5a8ffeff3d1c8bb4ff6947db4cb37db69f080ef

The original reproducer with this fix can be found here https://github.com/dstanesc/FluidFramework/tree/fix-test-pdds-uncommitted-behavior/experimental/PropertyDDS/examples/prop0

Of course, the final fix can be elaborated differently where only PENDING flag will be applied by dedicated method (if the rest of the function, which is not needed, would be considered an overhead for example due to emitting events).

Meanwhile we discovered another race condition issue between commits concerning concurrent inserting the same key into the MapProperty. We will create ticket once we have more details.

nedalhy commented 2 years ago

Hi @dstanesc @milanro

Thank you again for reporting the issue and for investigating it. We have a fix for it in this PR, it may as well fix the issue you are having with MapProperty.

milanro commented 2 years ago

@nedalhy Thank you for the fix. I tested it and it works in my test cases.

The other problem still persists but I will create another ticket for it. It will need more complex test case so it will take some time.

Just preliminary info is that when we add into map (which in my case is under path W_0.0249997558655648".words) the same key ("1") on both clients and then commit on both clients, and then change the sub-attribute (string property) W_0.0249997558655648".words[1].person on 1st client and commit, it will broadcast to second client but then change on the second client of that sub-attribute and commit leads to exception bellow. The in_context.getPropertyContainerType() is "template" and in_context.getUserData().parallelState is undefined.

react-dom.development.js:327 Uncaught Error: PR-003: Invalid path in ChangeSet: "W_0.0249997558655648".words[1].person. Making primitive value reversible. at _recursivelyBuildReversibleChangeSet (main.531681f2e492d4b3681d.js:161279:27) at _traverseChangeSetRecursively (main.531681f2e492d4b3681d.js:167256:13) at processChange (main.531681f2e492d4b3681d.js:167319:13) at _traverseChangeSetRecursively (main.531681f2e492d4b3681d.js:167445:25) at processChange (main.531681f2e492d4b3681d.js:167319:13) at _traverseChangeSetRecursively (main.531681f2e492d4b3681d.js:167432:25) at processChange (main.531681f2e492d4b3681d.js:167319:13) at _traverseChangeSetRecursively (main.531681f2e492d4b3681d.js:167445:25) at processChange (main.531681f2e492d4b3681d.js:167319:13) at _traverseChangeSetRecursively (main.531681f2e492d4b3681d.js:167432:25)

dstanesc commented 2 years ago

@nedalhy Confirm that https://github.com/nedalhy/FluidFramework/commit/15f6542bdd6651a3c3f831e1187979fa478c94a9 fixes the bug reproducer. Thank you!

ghost commented 2 years ago

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!