lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.64k stars 133 forks source link

Generating (caching) an index during build-time rather than runtime #254

Closed timelytree closed 5 months ago

timelytree commented 5 months ago

Hey @lucaong 👋

I've got an interesting situation → I've got a dataset that takes several seconds to index in the browser. Because of this, the main thread freezes while indexing occurs and then the rest of the DOM is rendered once indexing is completed. It's a noticeable-enough of a delay in page load that requires a workaround or a different solution.

I just had a thought → it would be useful if it was possible to cache an index server-side (or during static site generation) and then load the index client-side during page load. Essentially indexing during build-time, rather than every time during runtime. Another option would be to create a web worker, generate the index in the web worker, and then postMessage the index back to the main thread and feed it into minisearch. All of these ideas are the same at their core → exporting/importing a pre-generated index.

Do you think this is a possibility for the way you've got minisearch structured already?

timelytree commented 5 months ago

Aaaand of course I missed it in the docs 🤦

Thanks otherwise for an incredible little library!!

rolftimmermans commented 5 months ago

A project of mine faced the same issue. You can do this without a worker thread. Batch the indexing and wait on the next event cycle with setTimeout, before continuing with the next batch to index. This allows the browser to continue running other code so that no freezing occurs.

You can use my code:

const N = 8

export class Queue<T> {
  private data: T[] = []
  private timer: NodeJS.Timeout | undefined

  constructor(
    private fn: (items: T[]) => void,
    public delay = 250,
  ) {}

  add(item: T) {
    this.data.push(item)
    if (this.timer === undefined) {
      this.timer = setTimeout(this.start, this.delay)
      this.delay = 0
    }
  }

  // Process items in batches of N. If the current batch finishes within X ms,
  // then the next batch will be processed immediately.
  private start = () => {
    const start = Date.now()

    do {
      const items = this.data.splice(0, N)
      this.fn(items)
    } while (this.data.length && Date.now() - start < 30)

    if (this.data.length) {
      // Schedule the next batch.
      this.timer = setTimeout(this.start, 0)
    } else {
      // Nothing more to do. Leave any future scheduling to add().
      this.timer = undefined
    }
  }
}

Use the queue as follows:

const indexableItems = // All items to index
const queue = new Queue(items => {
  // Index batch of items here
})

// Add all indexable items one by one.
// (Or you can add a method to Queue that takes all items at once.)
for (const item of indexableItems) {
  queue.add(item)
}