yorkie-team / yorkie

Yorkie is a document store for collaborative applications.
https://yorkie.dev
Apache License 2.0
771 stars 143 forks source link

Add MaxHeightSplay for the splay tree to improve its performance by reducing its skewness. #957

Closed m4ushold closed 1 month ago

m4ushold commented 1 month ago

What this PR does / why we need it:

Why this is needed: Currently, the crdt.Text data structure uses a Splay tree. While Splay trees are efficient for performing operations in a continuous range, they have a downside where consecutively inserted elements may become linearly arranged in the tree, leading to a skew tree. When performing M operations on a tree with N nodes, the performance is generally guaranteed to be O((N+M) log N). However, if the tree becomes skewed, each operation might initially take O(N) time, though it could eventually improve. This skewness can degrade the performance of the crdt.Text data structure, making it crucial to explore ways to prevent this skewness and maintain the tree's performance and efficiency. This PR aims to address the performance and efficiency degradation in the crdt.Text data structure caused by skewness in the Splay tree.

What this PR does: In this Pull Request, we introduce a method called max_height_splay to reduce the skewness. This method involves finding the deepest leaf node and performing a splay operation on it every √n operations, where n is the number of splay operations performed. The time complexity of max_height_splay is also amortized O(log n), similar to other operations. A POC for this approach was implemented in C++, and the code can be found here.

Which issue(s) this PR fixes:

Fixes #941

Special notes for your reviewer: If operations are primarily occurring at the end of the document, performance degradation might still occur. Although this may require further research, it seems worthwhile to implement as it can provide good performance in most cases. I have added test code but did not include benchmark code. If you think it's needed, please mention me! And while adding the test code, I needed to check the height of the Splay tree nodes, which led to some modifications in other test code as well.

Does this PR introduce a user-facing change?:

NONE

Additional documentation:

Checklist:

Summary by CodeRabbit

coderabbitai[bot] commented 1 month ago

Walkthrough

The changes optimize the Splay tree implementation by incorporating enhancements to prevent skewness, thereby improving performance in the crdt.Text structure. Key updates include the addition of a height field to nodes and optimized splay operations that adapt based on the number of operations performed, ensuring a balanced tree structure. Benchmark tests and assertions are also updated for accuracy, reflecting the new output representations for the Document structure.

Changes

Files Change Summary
pkg/document/document_test.go, test/bench/document_bench_test.go Updated assertions for ToTestString() method outputs in both tests to reflect new array formats.
pkg/splay/splay.go, pkg/splay/splay_test.go Enhanced Node and Tree structures with height management, optimized splay operations, and updated test assertions to match new output formats.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SplayTree
    participant Node

    User->>SplayTree: Insert Node
    SplayTree->>Node: Create NewNode
    Node->>Node: Initialize Height
    SplayTree->>SplayTree: Update Height
    SplayTree->>User: Return Tree Structure

Assessment against linked issues

Objective Addressed Explanation
Improve performance of Splay Tree to prevent skewness (##941)

Poem

In the forest where trees sway,
A splay tree danced today.
With heights anew and branches wide,
It skips the skew and takes in stride.
A hop, a leap, in code we trust,
Performance shines, it's a must! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
m4ushold commented 1 month ago

I’ve decided to close this PR as further research is required. The code applying max height splay caused a stack overflow during benchmarking, and I need to analyze the cause in more detail.

In addition, I found a test case that works inefficiently because the new implementation method calls the max height display function first and then the display.

Given these issues, I see a strong need for benchmarking code that can compare the new implementation with the existing one. I will prioritize this task first, and once all concerns are fully resolved, I will submit a new PR.