olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.97k stars 547 forks source link

Lunr.js as a search for static websites? #26

Open arafalov opened 11 years ago

arafalov commented 11 years ago

It would be super-amazing to have a Lunr.js run with statically generated websites. I can imagine it running effectively in two modes: 1) Server-side mode during static site generation - when it does indexing, similar to how SASS precompiler works. That would generate a compressed index. 2) Client side which loads the index and does the search.

At the moment, statically generated sites work without search or use Google embedded search. This would open up a lot more options for them, including in presentation and faceting, avoiding generic template elements, etc.

olivernn commented 11 years ago

I think @slashdotdash has already created a plugin for jekyll that uses lunr to provide full text search on jekyll sites - https://github.com/slashdotdash/jekyll-lunr-js-search.

Lunr already allows you to build an index server side and then load this serialised index in the browser to search against.

All that you'd need to do is to be able to convert whatever format the static content is written in (e.g. markdown) into a JSON format that can be loaded into lunr for indexing. You would also need to be able to associate a url with each entry, perhaps as the ref that lunr uses to identify indexed documents.

If you get something up and running let me know, I'm always interested to see where lunr is being used!

slashdotdash commented 11 years ago

@arafalov will happily accept a pull request to extend jekyll-lunr-js-search with the pre-built indexing support.

arafalov commented 11 years ago

I am missing some of the skills to do that at the moment and my current focus is on Solr. :-( But perhaps after that.

olivernn commented 11 years ago

@slashdotdash does it not do this already? I thought it did, sorry! Let me know if you need a hand with adding this feature, I'm not familiar with jekyll plugins in general but am interested in how to generate indexes from ruby for a project where I'm using lunr so perhaps there is some overlap.

There is a simple, server side, index buider being used for the example on http://lunrjs.com/example here that might be of some use - https://github.com/olivernn/lunr.js/blob/master/example/index_builder.js

arafalov commented 11 years ago

That example is still uncompressed though, right?

"body": "basically what I want to do is forward people to a ...."

I was thinking about:

'what': doc1, doc2, doc3 'want': doc3, doc55

Obviously a compressed and optimized version of that. Again, what Lucene does with inverted tree and what lunr.js must be doing after tokenization, right?

olivernn commented 11 years ago

http://lunrjs.com/example/example_index.json basically contains a serialised version of a lunr index, so it contains the inverted index as well as any other data structures that are required for lunr to be able to execute searches. If the actual file size becomes an issue then it can be compressed before being served, but this is outside of the scope of lunr I think.

arafalov commented 11 years ago

Oh, my mistake. I was looking at example_data.json instead of example_index.json. That's exactly what I was thinking about. Looks like the feature is implemented and maybe it just needs to be advertised more. Not sure if Jekyl integration uses the compressed index though.

olivernn commented 11 years ago

Yeah from @slashdotdash comments it appears not.

As for documentation, I definitely agree with you, I need to get around to writing some more guide-like documentation as well as the API docs.

slashdotdash commented 11 years ago

@olivernn was thinking of using node.js and some JavaScript much like your index_builder.js to convert the .json file.

slashdotdash commented 11 years ago

... or even via http://rubygems.org/gems/execjs

arafalov commented 11 years ago

I think Node.js or execjs is the best way to be DRY. And isn't SASS compilation already has a similar pipeline? This could be just 'index compilation' step in the assets pipeline.

olivernn commented 11 years ago

I think execjs is probably the right way to go.

Maybe there is room for a specific lunr-generate command line tool? Something like the following:

$ lunr-generate --fields title:10,body --ref id --pipeline stopWordFilter,stemmer input_data.json > serialised_index.json

This might be a little simpler than having to create a JavaScript file that sets up the index, reads in data and writes out data. It wouldn't be able to deal with any custom pipeline functions that people want to use, but for simple use cases it might be useful.

Not sure lunr-generate is the correct name for it either.

Anyway, just a thought for now…

brockfanning commented 11 years ago

For any Docpad users: https://github.com/brockfanning/docpad-plugin-lunr

bludrop commented 10 years ago

@pburtchaell if you haven't found it yet, there is a plugin being developed at https://github.com/assemble/assemble-contrib-lunr

rripken commented 10 years ago

I had a somewhat similar idea and just now came across this page.
I'm running Spring on Tomcat on the server-side to generate the json that lunr would search. The json is mostly static and so it seemed silly to force all the clients (mobile) to download the json and then locally compute their own indexes. I found that building the index was taking the clients about 4 seconds. I was able to get lunr to work server-side via Java's ScriptEngineManager. I had the JavaScript ScriptEngine load es5-shim, console-shim and json2 shim and lz-string-1.3.3. The server side has the ScriptEngine load a bunch of js source files and then calls a javascript method that builds the index on the server-side and stringifies it and then lz-string compresses it. I could have relied on the server to do transparent gzip compression, but lz-string is better for my purposes because it gets reasonable compression that is also compatible with localStorage. The clients can cache the compressed index's in localStorage. Because the input data is mostly static, I've setup a server-side cron task to periodically rebuild the server's compressed indexes.
I found it also takes about 4 seconds to generate the compressed index on the server-side. I tried compiling the javascript (es5-shim, json2, lz-string, lunr) to a java object via Rhino and the jsc compiler. The result worked but took about a second longer. I even tried doing the lz-compression via a java lz-string library implementation but that took longer still. If I need it to go faster I can probably generate the json in one thread and feed it to lunr in another. Considering how much processing it is going to save the clients, I'm happy with the 4 second number.

Just wanted to report back that it can work, even from Java.

Fil commented 9 years ago

I wanted a similar builder but to run from browser, not node.js; to export the data back to a file I had to do this https://gist.github.com/Fil/eca2a5626256668100d9

djfdev commented 9 years ago

FYI - I started working on a gulp plugin for this.

gulp-lunr

Not yet tested, and does not yet support configuration of fields/ref (just uses href and puts everything in "body").

matiasgarciaisaia commented 7 years ago

👋 lunr.js is actually the basis of manastech/middleman-search, a Middleman extension for client-side search.

The implementation was pretty much straightforward - it may also help you build your own plugin/extension for any other system you have to deal with.

lunr.js💙