vladsud commented 4 years ago

Expose a way for components to provide a way to supply alternative ".search" blobs as part of generating component snapshot, such that components have ability to provide searchable stream for service to index

ksbrar commented 4 years ago

Search Tasks Overview (September 2020)

Last updated: 09/23/20

The Goals of Search

The primary purpose of the search task is to enable the Fluid Framework runtime to offer text extraction and cataloguing that is more robust than what is available by default to the developer. Currently, if a developer, for example, wanted to design a custom search indexer that could keep up to date with changes to DataObject(s) over time (or any similar system that relies on polling for a specific class of changes over time), then they would need to rely on raw op log processing or resort to other inscrutable tricks. If we add a capability that lets a DataObject periodically (i.e. on every snapshot) update an easily accessible "blob" of information, then this task becomes much easier. Furthermore, the solution for "search" could be generalized for a whole class of "augmentation" tasks that need to periodically poll for information that may or may not be text.

Search Design

There are essentially two components to search:

The client (e.g. the DataObject, the developer, etc.) curates a text representation that will be saved on every snapshot/summary.
- The reasonable way to do this is for a DataObject to define a callback that can be invoked by the server.
The server (e.g. routerlicious, tinylicious, etc.) grabs the client's text representation at the time of summary and pipes it to the desired indexer.
- In certain situations, this will require extra infrastructure to support the client's text extraction. Otherwise, the server simply looks for the client's optional representation and ignores the search task if it is not provided.
- The client's text representation is stored as a Blob on the MergeTree

Current Work Items

[ ] Basic search infrastructure/test.
- Details: In dataStoreRuntime.ts's snapshotInternal() method, add a task that invokes a DataObject's callback to check if there is any search text to be appended to the snapshot tree.
- [x] Choose/develop a DataObject for testing search, use a callback to populate a search blob
  - Solution: use shared-text for this test.
- [x] Add infrastructure in datasStoreRuntime for invoking callbacks to the DataObject.
  - Solution: add an extra method to the FluidDataStoreRuntime class to register a callback to a DataObject and keep a private reference to it, to be invoked later during snapshotInternal().
- [ ] Pending work: fix bug with latest build of runtime
[ ] Basic infrastructure to transfer search text from "blob" to indexer
- [ ] Parse and extract search representation stored in snapshot tree.
- [ ] Send to indexer (e.g. Fluid preview backend)
[ ] Upgrade server code to support concatenating search representations from multiple sources
- [ ] Capture search text from multiple DataObjects, or a DataObject within a DataObject
- [ ] Work on efficient transfer of this data
[ ] Naming conventions, search-specific trees (TBD)
[ ] Generalize search code to just be another capability of FCL layer.

vladsud commented 3 years ago

All search work (except of GC) is done, so closing this issue. @ksbrar - you probably want to close you PR. I've added similar functionality through mixins, and while some small code is in our repo, most of the code (the knowledge about search contract) is in Bohemia repo

microsoft / FluidFramework

Search intergation #2738

Search Tasks Overview (September 2020)

The Goals of Search

Search Design

Current Work Items