yorkie-team / yorkie

Yorkie is a document store for collaborative applications.
https://yorkie.dev
Apache License 2.0
786 stars 145 forks source link

Provide DocEvent Webhook #1002

Open hackerwins opened 2 months ago

hackerwins commented 2 months ago

What would you like to be added:

We are currently implementing an LLM-based document search functionality in CodePair. As part of this, we need to maintain a vector of document content in Vector Store. It's crucial that any updates to the document are reflected in the Vector Store by continually editing the content.

To achieve this, we require a mechanism that notifies external services like CodePair when documents are modified in Yorkie. We propose the introduction of a Webhook system that triggers when a document event occurs.

Specifically, we suggest that when handling the PushPullChanges requests, the server should check if a Webhook for the DocEvent is registered for the project. If it is, the server would call that Webhook during the background routine of the PushPullChanges API execution, right before publishing the DocEvent.

I think it will have a similar structure to the Auth Webhook, and if changes occur frequently, an event control device such as debouncing will also be needed.

Why is this needed:

This enhancement would enable seamless integration with external services, allowing for real-time updates to Search Engine or Vector Store based on document changes in Yorkie, thereby enhancing the overall document management and search capabilities of our application.

window9u commented 2 months ago

Hello! Could I try this issue?

window9u commented 6 hours ago

What Events Should We Send?

To keep external services informed about the state of documents, we should send events corresponding to the CRUD (Create, Read, Update, Delete) operations.

Common Webhook Specifications

Event Types and Payloads

a. Document Created

b. Document Watched

c. Document Unwatched

d. Document Changed

1. Change Event
2. Snapshot Event

e. Document Deleted

window9u commented 6 hours ago

Explanation of Events

Common Parts

Document Watched / Unwatched

Document Changed

Change Event
Snapshot Stored
window9u commented 6 hours ago

If the above data types are finalized, we should consider the following:

  1. Where to Send the Data
    • Adding an Endpoint Attribute to the Project: We need to include an endpoint property in the project configuration to specify where the data should be sent.
    • Defining Endpoint Properties: We should define various properties of the endpoint, such as the debouncing period, snapshot period, or how frequently to send data (e.g., after a certain number of changes).
    • Batching Events: It might be possible to send events in batches (if CodePair processes them in batches). Therefore, we need to discuss batching strategies.
    • Security Considerations: Determine how to handle security, such as how users can verify that the Yorkie server is the one sending the data.
  2. How to Handle Exceptions
    • Timeout Settings for Requests: Decide on a timeout setting for individual requests.
    • Handling Unresponsive Endpoints: Determine what to do if the endpoint continuously fails to receive requests.
    • Storing Unsent Events: Should we store events that the endpoint failed to receive? Or should we allow users to choose whether or not to store them? If we decide to store them, where should we store them?