olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.91k stars 546 forks source link

First attempt at mutable indexes for lunr.js 2.x #315

Closed hoelzro closed 6 years ago

hoelzro commented 6 years ago

Hi there! I chimed in a separate issue about mutable indexes for 2.x, and I took a stab at adding them for my own purposes. This PR (which I don't expect you to merge if you don't want to - I just wanted to share my work, get some review, and start a conversation) is the result of that work. I'll be trying this for my FTS TiddlyWiki plugin - if it's something you're not interested in merging in, I can always publish it as a separate module, perhaps.

I made a few tradeoffs for this initial implementation:

JavaScript isn't a language that I "speak" fluently, so any feedback on idioms that I may have improperly used and how to correct them would be most welcome. I would appreciate any feedback on this, even "this won't work because of X, Y, and Z" because that would save me some headaches. =)

olivernn commented 6 years ago

I need to take a closer look through the change, but I think the general approach might work. Having a subclass of the immutable index that adds mutability. I especially like the fact that it can then be packaged as an extension that users can opt into if they need it.

Let me take a closer look through your implementation to see if there are any gotchas that you need to be aware of.

hoelzro commented 6 years ago

Sounds good - thanks for the quick response!

hoelzro commented 6 years ago

I added a few new commits - just trimming down the serialized size for mutable indexes a bit.

k00p commented 6 years ago

This PR looks great - it would be extremely useful for server side applications, or any use case requiring index maintenance. Anything I can do to help get this merged?

hoelzro commented 6 years ago

@k00p I put this here as a sort of proof-of-concept - I was actually thinking of publishing it as a separate extension for lunr.js. I've been using it on my own for a while now and it seems to work fine - I should probably just package it up on NPM =)

k00p commented 6 years ago

I went ahead and installed it as a node module directly from this pull request:

$ npm i --save olivernn/lunr.js\#pull/315/head

Note that the \# is an escape on the # character for my shell. Use in other shells may vary.

Test errors out of the box - the tests need to be added to test/env/file_list.json.

I am still not getting it to work yet, but I will keep trying to figure it out.

Update: Have it working now but am still validating updates and removes. If you aren't concerned with the errors for the npm test for the module, so far using a reference to this PR is successful. At this point there is nothing that would lead me to discourage accepting this PR, and I will update if anything comes up. FWIW the tokenizer persistence may be a show stopper.

Last Update: The tokenizer persistence issue ended up not being a big deal. I was concerned because I was attempting to use metadata to carry document data, but that was a fundamentally flawed strategy. The fix was to maintain a dictionary in addition to the index, and then ornament the search results from the dictionary. For my purposes, the increase in memory footprint is worth the gains in response times.

hoelzro commented 6 years ago

So I finally got around to bundling this PR up into a standalone NPM package: https://www.npmjs.com/package/lunr-mutable-indexes

It's my first NPM package, so any feedback on how I could improve it would be great!

andymcm commented 6 years ago

Just wanted to say that I'm using this and finding it extremely valuable. Would be great to see it as part of core Lunr.

k00p commented 6 years ago

I switched my focus as well to @hoelzro lunr-mutable-indexes project. The benefits for server side index maintenance greatly outweigh the drawbacks, and so far the project "piggybacks" on lunr, so it should continue to be a mutually beneficial relationship between the two repos.

hoelzro commented 6 years ago

Just want to update everyone monitoring this - @k00p was kind enough to lend me some of their time, so I integrated their changes and released version 0.2.0 to NPM! Thanks @k00p!

hoelzro commented 6 years ago

I'm going to go ahead and close this - I think that this functionality existing as a separate module is the best place for this, and I can always submit a new PR in the future if @olivernn wants me to fold this into lunr core.

NallT commented 6 years ago

@hoelzro This works great. I have server side code where the index is rather large and often updated.

Thanks!