microsoft / FluidFramework

Library for building distributed, real-time collaborative web applications
https://fluidframework.com
MIT License
4.72k stars 532 forks source link

Search intergation #2738

Closed vladsud closed 3 years ago

vladsud commented 4 years ago

Expose a way for components to provide a way to supply alternative ".search" blobs as part of generating component snapshot, such that components have ability to provide searchable stream for service to index

ksbrar commented 4 years ago

Search Tasks Overview (September 2020)

Last updated: 09/23/20

The Goals of Search

The primary purpose of the search task is to enable the Fluid Framework runtime to offer text extraction and cataloguing that is more robust than what is available by default to the developer. Currently, if a developer, for example, wanted to design a custom search indexer that could keep up to date with changes to DataObject(s) over time (or any similar system that relies on polling for a specific class of changes over time), then they would need to rely on raw op log processing or resort to other inscrutable tricks. If we add a capability that lets a DataObject periodically (i.e. on every snapshot) update an easily accessible "blob" of information, then this task becomes much easier. Furthermore, the solution for "search" could be generalized for a whole class of "augmentation" tasks that need to periodically poll for information that may or may not be text.

Search Design

There are essentially two components to search:

  1. The client (e.g. the DataObject, the developer, etc.) curates a text representation that will be saved on every snapshot/summary.
    • The reasonable way to do this is for a DataObject to define a callback that can be invoked by the server.
  2. The server (e.g. routerlicious, tinylicious, etc.) grabs the client's text representation at the time of summary and pipes it to the desired indexer.
    • In certain situations, this will require extra infrastructure to support the client's text extraction. Otherwise, the server simply looks for the client's optional representation and ignores the search task if it is not provided.
    • The client's text representation is stored as a Blob on the MergeTree

Current Work Items

vladsud commented 3 years ago

All search work (except of GC) is done, so closing this issue. @ksbrar - you probably want to close you PR. I've added similar functionality through mixins, and while some small code is in our repo, most of the code (the knowledge about search contract) is in Bohemia repo