olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.96k stars 548 forks source link

Problem with serializing/deserializing index to/from JSON #52

Closed mdlawson closed 11 years ago

mdlawson commented 11 years ago

After making an index like so:

index = lunr(function(){
      this.ref('id');
      this.field('title', {boost: 10});
      this.field('text');
});

And calling index.toJSON() to create a storable representation, and then reloading the index from it:

newIndex = lunr.Index.load(index.toJSON());

The new index seems to be "broken" in some way. Attempting a search, for instance, gives the error TypeError: Cannot read property 'tf' of undefined

Maybe I'm making some simple error here, but it seems a little strange.

brockfanning commented 11 years ago

I think lunr.Index.load expects a string representation of the JSON, so you might have to do this:

newIndex = lunr.Index.load(JSON.stringify(index.toJSON()));

mdlawson commented 11 years ago

@brockfanning the implementation of lunr.Index.load() doesn't seem to suggest that at all, though I haven't tested it.

brockfanning commented 11 years ago

Ah, you're right. I'm doing that because I'm writing the index to a file. Sorry, I don't see why your example wouldn't work. Have you tried indexing some items before attempting to export/import?

brockfanning commented 11 years ago

Actually that shouldn't matter. Maybe a browser issue?

mdlawson commented 11 years ago

I have been testing with items added to the index, I just didn't show it. I first got this issue in chrome 32, but I just reproduced it in firefox 23 as well, except with the error: TypeError: this.tokenStore.get(...)[t] is undefined instead.

brockfanning commented 11 years ago

I'm stumped, if I paste the two statements in your first post into the console on a page that includes lunr.min.js, I don't get any errors.

brockfanning commented 11 years ago
index = lunr(function(){
      this.ref('id');
      this.field('title', {boost: 10});
      this.field('text');
});

lunr.Index {_fields: Array[2], _ref: "id", pipeline: lunr.Pipeline, documentStore: lunr.Store, tokenStore: lunr.TokenStore…}

newIndex = lunr.Index.load(index.toJSON());

lunr.Index {_fields: Array[2], _ref: "id", pipeline: lunr.Pipeline, documentStore: lunr.Store, tokenStore: lunr.TokenStore…}

newIndex.add({id:1,title:'My Title',text:'My Text'})

undefined

newIndex.search('text')

[ Object ref: "1" score: 0.0995037190209989 proto: Object

mdlawson commented 11 years ago

try:

index = lunr(function(){
      this.ref('id');
      this.field('title', {boost: 10});
      this.field('text');
});
index.add({id:1,title:'My Title',text:'My Text'});
newIndex = lunr.Index.load(index.toJSON());
newIndex.search('text');

Gets the error in all browsers I've tried.

brockfanning commented 11 years ago

Aha, I get the error now too.

Well, not sure why, but apparently stringifying and parsing it is the reason I wasn't seeing that error in my own app:

index = lunr(function(){
      this.ref('id');
      this.field('title', {boost: 10});
      this.field('text');
});
index.add({id:1,title:'My Title',text:'My Text'});
indexStringified = JSON.stringify(index.toJSON());
indexParsed = JSON.parse(indexStringified);
newIndex = lunr.Index.load(indexParsed);
newIndex.search('text');
olivernn commented 11 years ago

As @brockfanning mentioned you must serialise the index by using JSON.stringify:

var idx = lunr(function(){
      this.ref('id')
      this.field('title', { boost: 10 })
      this.field('text')
})

idx.add(doc)

JSON.stringify(idx)

The reason JSON.stringify is important is that it will recursively call toJSON on every object that toJSON returns. In most cases this doesn't matter (which is why diagnosing this issue is tricky!) however there are a couple of objects that act as stores for other objects, specifically lunr.Store. The return value from just calling toJSON on lunr.Store and the output of JSON.parse(JSON.stringify(store)) are different. This is because JSON.stringify is going through each item in the store and making sure toJSON is called, just calling lunr.Store.prototype.toJSON does not do this.

I think this is a case where the documentation could be improved, making it clearer how to serialise an index as well as marking the toJSON methods as private. I am currently working on a big overhaul of the documentation for lunr so hopefully other people will not run into the same problem.