Closed DannyNemer closed 11 years ago
Found the bug!
As demonstrated below, when loading a serialized index, lunr.js incorrectly loads the serialized index's documentStore.
First, create an index, add a document to the index, and log the index.documentStore.store
to show its correct form:
var index = lunr(function () { //create index
this.field('title')
this.ref('id')
})
index.add({id: 100, title: 'apple' }) //add document to index
console.log(index.documentStore.store) //log documentStore.store before serializing index
The following is the log of the index.documentStore.store
in its correct form:
{ '100': { length: 1, elements: [ 'appl' ] } }
Then, serialize the index and store it to the database:
user.index = index.toJSON() //serialize index and store it in the database
user.save() //save changes to database
Load the serialized index and log the index.documentStore.store
to show it in its incorrect form:
index = lunr.Index.load(user.index) //load previously serialized index from database
console.log(index.documentStore.store) //log incorrect documentStore after loading serialized index
The following is the log of the index.documentStore.store
in its incorrect form:
{ '100': { length: 1, elements: { length: 1, elements: [Object] } } }
Bug: The bug occurs on line 962
of lunr.js
in the lunr.Store.load()
function, which is called in lunr.Index.load()
. Simply, this is fixed by changing
memo[key] = lunr.SortedSet.load(serialisedData.store[key])
to
memo[key] = lunr.SortedSet.load(serialisedData.store[key].elements)
Thanks for a very detailed bug report, it makes it much easier to try and get to the bottom of what is going wrong! Could you let me know what version of lunr you are using, both for serialising and loading the index? You can get this using lunr.version
.
I'm using version 0.3.0. But as you can see, I already found the bug in my follow-up comment and I submitted a pull request with the fix.
Yep, I saw, just trying to re-produce the error locally myself.
Ok, I've looked into this some more and I think it is an issue with how you are serialising/loading the index. The index object provides a toJSON
method, this is provided so that when the index object is serialised using JSON.stringify
the whole index is correctly serialised as JSON.
Using JSON.stringify
will handle recursively correctly serialising each object to JSON. By directly calling toJSON
on the index you are bypassing this. For most cases this doesn't matter, however in the case of the lunr.Store
it's toJSON
method is not calling toJSON
on every object in the store.
lunr.Store.prototype.toJSON = function () {
return {
store: this.store,
length: this.length
}
}
The issue is then that index.toJSON
does not return the same as JSON.parse(JSON.stringify(index))
, I'm not sure this is a major issue, although I don't know your use case…
There are a couple of ways this could be fixed, both would be just trying to achieve more consistency.
lunr.Store.prototype.toJSON
call toJSON
on every object within the store, (these will always be lunr.SortedSet
)toJSON
on child objects as this is handled by JSON.stringify
I see. Thank you for your explanation.
As is, index.toJSON()
calls the following function:
lunr.Store.prototype.toJSON = function () {
return {
store: this.store,
length: this.length
}
}
This returns the following serialized index.documentStore
:
"documentStore" : {
"length" : 3,
"store" : {
"1" : {
"length" : 1,
"elements" : [
"appl"
]
},
"2" : {
"length" : 1,
"elements" : [
"orang"
]
}
}
}
As a result, when I use lunr.Index.load(index)
later, followed by index.search('apple')
, the error I explained above occurs. Am I correct to conclude that currently in lunr.js
, a perviously serialized index cannot successfully be searched without receiving this error? I ask because you mentioned that this might be dependent on my use case.
For now, I have changed lunr.Store.prototype.toJSON
to the following:
lunr.Store.prototype.toJSON = function () {
return {
store: JSON.parse(JSON.stringify(this.store)),
length: this.length
}
}
As you explained, using JSON.stringify
correctly serializes each object in index.documentStore
to JSON:
documentStore" : {
"length" : 3,
"store" : {
"1" : [
"appl"
],
"2" : [
"orang"
]
}
}
As a result, index.search('apple')
works as intended.
Furthermore, am I correct to conclude that the reason for your implementation of toJSON
, instead of simply calling JSON.parse(JSON.stringify(index))
, is to call lunr.Pipeline.warnIfFunctionNotRegistered
when serializing the pipeline? Other than that and the issue discussed above, are toJSON
and JSON.parse(JSON.stringify(index))
identical?
Thank you very much.
You should absolutely be able to search within a index loaded from a previously serialised index without encountering errors!
The way to correctly serialise a lunr index is to just use JSON.stringify
e.g. JSON.stringify(index)
. This will return a JSON string representing the index. You can then do whatever you want with this string, save it somewhere, send it to a web worker, wherever you need to start up a copy of the original index somewhere else at another time.
To re-load an index you then need to parse this json and pass it to lunr.Index.load
e.g. lunr.Index.load(JSON.parse(serialisedIndex))
. This will return you a new index with the exact same data as you original index, you will be able to search without any errors etc, if not there is a bug in lunr (please report it!).
To conclude, you probably shouldn't need to call toJSON
directly on the index, it is provided for the internals of how JSON.stringify
works. Hence my question about your use case, why do you call index.toJSON
directly instead of using JSON.stringify(index)
? Why do you want a plain object representation of the index rather than a JSON string?
I misunderstood. The reason I have been calling index.toJSON
instead of JSON.stringify(index)
was because index.toJSON
was documented as lunr
's method for returning a representation of the index ready for serialization, and has additional functionality specific to lunr
that JSON.stringify
does not, such as ensuring pipeline functions are registered in lunr.Pipeline.prototype.toJSON
. As a result, I was under the false impression that index.toJSON
was preferred. Nevertheless, I now understand that I was mistaken, and lunr
works as intended.
Thank you very much for your help, as well as for your wonderful work.
No worries, the documentation around serialisation could probably do with a little clarification, I'll see if I can make it a little clearer.
I found the bug. You can read my explanation in the first comment below.
When I search a previously serialized index, having loaded it using
lunr.Index.load()
, I receive aTypeError
(shown in detail below). The following are steps to reproduce this error:First, I create an index, add a document to the index, and log the index to ensure it is working:
The following is the result of the
console.log(index)
. As you see, everything is working correctly:I then serialize the index and store it in my database:
Later, I load the previously serialised index and log the index to ensure it is working:
The following is the result of the
console.log(index)
. As you see, everything is working correctly and is identical to before:Again, I will add a document to the index and log the index to ensure it is working after having loaded the previously serialized index:
The following is the result of the
console.log(index)
. As you see, everything is still working correctly:Here is the problem. I will now attempt to search the index for a document that was added before serializing the index (
{id: 1, title: 'apple' }
):This is there error I receive:
This is the code in its entirety:
Note: If, at the final line, I searched the index for
'ba'
, it would correctly return{id: 2, title: 'banana' }
, which was added after loading the index. Also, if I pass a query that normally would not return any result, such as'da'
, it would correctly not return anything, without error.Thank you very much.
I found the bug. You can read my explanation in the first comment below.