google / btree

BTree provides a simple, ordered, in-memory data structure for Go programs.
Apache License 2.0
3.9k stars 414 forks source link

Read traversal and COW #22

Open gallir opened 6 years ago

gallir commented 6 years ago

My case: I have always two clones, one for updating and the other for read traversal, every time a "transaction" is done I clone again the tree to force the next update to copy the nodes.

The problem with cow: even for this case it may occur that the reader traverses a subtree that's sill being modified (i.e. and incomplete). The cause is that mutableFor() changes t.root immediately and the changes are applied to the new root. The same happens for children derived from mutableChildren() that changes the also n.children[i].

The solution is to delay the change to the original pointer (root or children) until all changes in the subtree are finished.

I prepared a first patch wit the idea, only for t.root: https://github.com/google/btree/compare/master...gallir:cow

If it's correct and you agree I can prepare a patch for children and items if needed.

gallir commented 6 years ago

More info about the proposal.

I'm actually using https://github.com/tidwall/btree fork and implemented the entire idea on it:

  1. The equivalent to what I proposed above: https://github.com/gallir/tidwall.btree/commit/d659c208d5a646cabda77a9804cb314cc8f92cb0

  2. Replaced mutableChildren() for insert and remove: https://github.com/gallir/tidwall.btree/commit/1cae1dd6d7762bbc692a95639e720b18df628e9a

  3. Replaced mutableChildren in growChildAndRemove(): https://github.com/gallir/tidwall.btree/commit/ac1d4aba272c671b9e4d40515c3d00cff8841342

  4. Replaced mutableChildren in maybeSplitChild(): https://github.com/gallir/tidwall.btree/commit/56f9d35a1ed48dc3bf30086234bf8a905908541b

I'm testing it with real data and update from our production system (at dotw.com) and getting a lot less misses during concurrent traversals.

Although during the process I learnt that most of the time the nodes were and are not being reused with the exception of the merge node due to the check of the cow pointer if you do a clone after ever serie of updates. To improve it you have to avoid cloning and by doing it the current mechanism for freeing and reusing don't make too much sense for cases where updates are atomic/mutually exclusive.