olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.96k stars 548 forks source link

Storage Events #30

Closed olivernn closed 11 years ago

olivernn commented 11 years ago

Adding storage events to a a lunr.Index would make it easy to snapshot an index to some storage location, whether that be localStorage in the browser or to file or some other database on the server.

I think lunr would have to emit three events, add, update and remove, this should give users enough hooks to maintain a persisted copy of their index.

It might work like this:

index.on('add', function (doc, index) {
    // doc is the newly added document
    // index is the instance of `lunr.Index` that has been added to

    localStorage.set('asdf', JSON.serialize(index))
})

The callback signature would be the same for the add, update and remove events.

The way update is implemented, it first removes a document and then re-adds it, would mean that some special care would be needed to make sure that only an update event is fired rather than the remove and then add event, but this should be simple enough.

To support this all three methods, add, update and remove could take an argument that prevents any events from being emitted, this also might be useful when doing a bulk load.

I don't think event handlers should be serialised, so when loading an index event handlers would have to re-added.

Venemo commented 11 years ago

Is it a good idea to have to serialize the entire index on every change? How would that scale if you have a large index?

garysieling commented 11 years ago

The way to resolve that might be to separate the concepts of 'add' and 'commit'- that way you'd have more control over when the write operations happen.

Venemo commented 11 years ago

Instead of making the events work on a per-document basis, they should be organized in such a way that helps persisting the index change by change, but without the need to serialize the whole index at a time.

Venemo commented 11 years ago

If it will only work by having to serialize the whole index, that means it will prevent lunr from being used in scenarios where the amount of data makes it impractical to do that.

garysieling commented 11 years ago

One way around that might be to structure the index so you can keep track of changes, and have a commit operation where you just save back the modifications.

Venemo commented 11 years ago

Yeah, that would be a very good thing, especially when using lunr on the server side. :)

olivernn commented 11 years ago

The storing won't happen in lunr at all, these events are just to give users the hooks they need to implement their own storage.

You raise some valid points about having to serialise the whole index on each change, for some this won't be a problem, however for larger indexes this could be an excessive overhead. It'd be good to seem some benchmarks of serialisation to be sure though.

@garysieling makes a good point that whatever hooks into these events could be a bit smarter about how it actually stores the index.

Another way would be to store many smaller indexes and merge them together again when loading a previously serialised index, see #29 for more details.

My aim with this feature is to provide the required hooks to be able to store the index. A more sophisticated solution can, and probably should, be built on top of these basic events.

ivanjr0 commented 11 years ago

It would be great if there was an easy way of persisting lunr indexes data incrementally on IndexedDB. I'm trying to use it inside a shared worker and persisting on localStorage but it seems to not scale well as data volume grows.

Venemo commented 11 years ago

Another useful approach would be to allow users of the API to specify their own storage backend for the index. This could mean, for example, to pass in an object with the necessary methods (eg. setValue, getValues, etc.) to store the inverted index, and the index would call these whenever the index changes or it needs data. The default storage backend could use indexed db in the browser, and we could implement our own way on the server as well.

olivernn commented 11 years ago

Simple storage events have been added in the latest release of lunr. This allows you to be notified of changes to the index, e.g.

idx.on('add', function (doc) {
  // do something here
})

idx.on('update', function (doc) {
  // do something here
})

idx.on('remove', function (doc) {
  // do something here
})