Closed m4ushold closed 1 month ago
The changes optimize the Splay tree implementation by incorporating enhancements to prevent skewness, thereby improving performance in the crdt.Text
structure. Key updates include the addition of a height field to nodes and optimized splay operations that adapt based on the number of operations performed, ensuring a balanced tree structure. Benchmark tests and assertions are also updated for accuracy, reflecting the new output representations for the Document
structure.
Files | Change Summary |
---|---|
pkg/document/document_test.go, test/bench/document_bench_test.go | Updated assertions for ToTestString() method outputs in both tests to reflect new array formats. |
pkg/splay/splay.go, pkg/splay/splay_test.go | Enhanced Node and Tree structures with height management, optimized splay operations, and updated test assertions to match new output formats. |
sequenceDiagram
participant User
participant SplayTree
participant Node
User->>SplayTree: Insert Node
SplayTree->>Node: Create NewNode
Node->>Node: Initialize Height
SplayTree->>SplayTree: Update Height
SplayTree->>User: Return Tree Structure
Objective | Addressed | Explanation |
---|---|---|
Improve performance of Splay Tree to prevent skewness (##941) | ✅ |
In the forest where trees sway,
A splay tree danced today.
With heights anew and branches wide,
It skips the skew and takes in stride.
A hop, a leap, in code we trust,
Performance shines, it's a must! 🐇✨
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
I’ve decided to close this PR as further research is required. The code applying max height splay caused a stack overflow during benchmarking, and I need to analyze the cause in more detail.
In addition, I found a test case that works inefficiently because the new implementation method calls the max height display function first and then the display.
Given these issues, I see a strong need for benchmarking code that can compare the new implementation with the existing one. I will prioritize this task first, and once all concerns are fully resolved, I will submit a new PR.
What this PR does / why we need it:
Why this is needed: Currently, the crdt.Text data structure uses a Splay tree. While Splay trees are efficient for performing operations in a continuous range, they have a downside where consecutively inserted elements may become linearly arranged in the tree, leading to a skew tree. When performing M operations on a tree with N nodes, the performance is generally guaranteed to be O((N+M) log N). However, if the tree becomes skewed, each operation might initially take O(N) time, though it could eventually improve. This skewness can degrade the performance of the crdt.Text data structure, making it crucial to explore ways to prevent this skewness and maintain the tree's performance and efficiency. This PR aims to address the performance and efficiency degradation in the crdt.Text data structure caused by skewness in the Splay tree.
What this PR does: In this Pull Request, we introduce a method called max_height_splay to reduce the skewness. This method involves finding the deepest leaf node and performing a splay operation on it every √n operations, where n is the number of splay operations performed. The time complexity of max_height_splay is also amortized O(log n), similar to other operations. A POC for this approach was implemented in C++, and the code can be found here.
Which issue(s) this PR fixes:
Fixes #941
Special notes for your reviewer: If operations are primarily occurring at the end of the document, performance degradation might still occur. Although this may require further research, it seems worthwhile to implement as it can provide good performance in most cases. I have added test code but did not include benchmark code. If you think it's needed, please mention me! And while adding the test code, I needed to check the height of the Splay tree nodes, which led to some modifications in other test code as well.
Does this PR introduce a user-facing change?:
Additional documentation:
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Tests