olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.96k stars 548 forks source link

Implement serialize #11

Closed ssured closed 11 years ago

ssured commented 11 years ago

Building the index for 7000 1-3 word strings is quite slow on a mobile device. One solution t to speed up would be to store and retrieve a generated index on the device. What's needed to make that happen is to be able to serialize the data inside lunr and add a deserialize too. JSON seems a nice fit. My understanding of the algorithm is not deep enough to see how data is stored and which part takes most computing, I'm willing to help though

olivernn commented 11 years ago

This feature is definitely coming, it has been a very popular request!

With this in place the index could be generated server side, using the node lunr module That index could then be dumped and sent out to clients.

There are a number of data structures in the index that would need to be serialised, most are probably relatively simple, the trie however is a heavily nested object and might require a little bit of thought on the best way to both dump, and load it.

Whenever talking about performance of things, having decent benchmarks is always helpful. There are currently some very basic performance tests in the perf folder. Any help with making these a bit more comprehensive and easily runnable would really useful.

ssured commented 11 years ago

I'll have a look into performance and the trie.

It seems there are 2 use cases for serialization: 1- server side generation of index 2- storing the tree in html5 web storage

For case 1 the storage size should be minimized, case 2 focus is on loading speed

Maybe cycle json support can help us for the complex data format: https://github.com/douglascrockford/JSON-js -> cycle.js

olivernn commented 11 years ago

I've added a branch with an implementation of serialisation #14 please try it out and let me know your feedback, this will make it into a 0.3.0 release soon.

ssured commented 11 years ago

wow, great stuff! I'm going to check it out tomorrow. I'll update on my findings!

On Mar 13, 2013, at 11:44 AM, Oliver Nightingale wrote:

I've added a branch with an implementation of serialisation #14 please try it out and let me know your feedback, this will make it into a 0.3.0 release soon.

— Reply to this email directly or view it on GitHub.

olivernn commented 11 years ago

Released in version 0.3.0