yorkie-team / yorkie

Yorkie is a document store for collaborative applications.
https://yorkie.dev
Apache License 2.0
771 stars 143 forks source link

Cache Changes to Improve Overall PushPull/MongoDB Performance #948

Open krapie opened 1 month ago

krapie commented 1 month ago

Description:

Currently, PushPull RPC always queries MongoDB, resulting in overhead. To reduce this overhead and improve the response time of PushPull RPC, we can consider implementing a caching mechanism for Changes or Snapshot data used in PushPull operations. These data have high locality due to the nature of CRDT use cases and are immutable, making them suitable for caching.

Why:

Implementing a cache for Changes or Snapshot data will optimize MongoDB performance and reduce response time for PushPull RPC operations. It will help improve overall system efficiency and user experience while working with PushPull functionalities.

binary-ho commented 1 month ago

@krapie Kevin, I'm interested in this issue so, i want to conversation with you. do you have any idea in caching strategies?

  1. What is the primary goal?

    1. Reducing user response time
    2. Reducing the load on MongoDB
  2. What exactly are you caching?

    • Changes and Snapshots?
  3. When does caching occur?

    • I think caching could be based on WatchDoc (deleting documents after they are no longer watched).
    • However, to prevent immediate deletion even if the gRPC connection is temporarily lost, I believe a basic TTL (Time-to-Live) would be necessary.
    • Similar to GC, unnecessary change caches could be removed using min_synced_seq.
    • If the cache size exceeds its limit, we would need a more thoughtful expiration strategy. However, I believe a strategy that favors documents frequently used by a large number of users would be beneficial.
    • I think the real-time performance of the cache is crucial. As far as I know, locking is based on the API, so if caching occurs after the first WatchDoc request, subsequent requests should receive the cached results, which should prevent any issues.
  4. What strategies are you considering? if the goal is to reduce user response time, a local cache might be better

    • Global Cache:

      • This could reduce the load on MongoDB or the time spent retrieving data from the server where the data is stored.
      • However, the communication time between servers would likely remain similar.
      • Storing all Changes and Snapshots in a single server's memory could consume a lot of memory.
      • If the main bottleneck is communication time, caching might be ineffective.
      • Nevertheless, this could be easier to implement and might be suitable if the primary goal is to reduce the load on the MongoDB application.
    • Local Cache:

      • With no external IO, user response time could be significantly faster.
      • Data is stored in the cache and periodically pushed to the database. (In this design, some changes could be lost if the server unexpectedly shuts down.)
      • If based on WatchDoc, a Thundering Herd problem could occur during server additions or downs, leading to split-brain scenarios. (However, this may not be a critical issue depending on the number of users.)
krapie commented 1 month ago

@binary-ho Well, this issue is just a conceptual thought that I have, and I think we need to discuss about the necessity and the benefits of this feature. But this is a very fun issue to discuss, so maybe starting with PoC might do.

For your questions:

  1. What is the primary goal?

Reducing the load of MongoDB is the primary goal, but I'm expecting reduced response time as well.

  1. What exactly are you caching?

Primary Changes, but need to check MongoDB query pattern.

  1. When does caching occur?

About the caching strategy, we need to brainstorm about it. I do not have any ideas for now.

  1. What strategies are you considering? if the goal is to reduce user response time, a local cache might be better

I'm considering local in-memory caching because we do not need global caching in cluster mode, which do not share document workload across the servers.