olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.89k stars 545 forks source link

Cannot read property 'similarity' of undefined #352

Closed miraks31 closed 6 years ago

miraks31 commented 6 years ago

I didn't find the reason why on some searches, I have the following exception:

Cannot read property 'similarity' of undefined
TypeError: Cannot read property 'similarity' of undefined
    at lunr.Index.query (C:\Users\to81591\Documents\Development\Cesar_ws\node_modules\lunr\lunr.js:2214:50)
    at lunr.Index.search (C:\Users\to81591\Documents\Development\Cesar_ws\node_modules\lunr\lunr.js:1924:15)
    at Server.requestHandler (C:\Users\to81591\Documents\Development\Cesar_ws\server.js:99:8)
    at emitTwo (events.js:126:13)
    at Server.emit (events.js:214:7)
    at parserOnIncoming (_http_server.js:602:12)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:116:23)

What could be the reason? Is it a bug or a bad usage on my side? Regards.

olivernn commented 6 years ago

Its difficult to tell from the exception alone. Could you provide a reproduction, or at least the code that you have that is generating this error?

miraks31 commented 6 years ago

Cesar_ws.zip

Hi, You can try like this: 1- You need nodejs 2- Unzip the attached file in a directory 3- launch node server.js 4- In a web browser, enter the following url: http://127.0.0.1:8080/search2/application_a/c_program_&_natcos:A350

Thank you in advance for your halp. Regards.

olivernn commented 6 years ago

@miraks31 can you provide a reduced/simplified reproduction? Something on jsfiddle (or similar) is much easier to work with.

miraks31 commented 6 years ago

No, I'm not. I think the issue is due to some indexed data but I wasn't able to identify the reason. That's why I'm not able to produce a minimalist scenario.

olivernn commented 6 years ago

Maybe I can give you a couple of pointers that might help you dig into the code to understand what is going on.

similarity is a method on lunr.Vector, an object that holds the vector space representation of a documents field. You can see all of these in your index here idx.fieldVectors. They are indexed by field/document_ref.

So somehow Lunr is looking for a field vector for a document ref/field pair that doesn't exist. The first step in debugging would be to figure out which vector it is looking for, a well placed console.log would probably do the trick.

I'm guessing now, but perhaps a document being indexed doesn't have a particular field? If so then Lunr probably could handle this a bit more gracefully.

miraks31 commented 6 years ago

Hi, I didn't reproduce EXACTLY the same issue but I think that the reproduced one is linked to this one. I'm saying that because I dumped some variable and I got that just before the error:

fieldRef.fieldName=application_a queryVectors[fieldRef.fieldName]=application_a

In my model, I don't have attribute named "application_a", but I have one named

"application_a/cprogram&_natcos".

Here is the link to reproduce at least an error: http://jsfiddle.net/0rwtspc2/36/

miraks31 commented 6 years ago

Just another comment, as you can see my attributes names have "_". I did that because I think that lunr doesn't support space in attributes names. Indeed, if I have an attribute named "my attribute" and if I want to search on this attribute, I would like to do something like this: my attribute:value lunr will raise an error because "attribute" is not an attribute. I tried this but without success: "my attribute":value

miraks31 commented 6 years ago

If in attributes names, I replace "/" by "_", this works well.

olivernn commented 6 years ago

@miraks31 good investigation, yeah I think the field name containing a "/" is the problem here. When Lunr combines the field and document ref into a single string to key into the vector map it does so with a "/". It seems likely that some piece of code inside Lunr is not handling multiple slashes correctly. I'll take a look later to see the if this can be improved in Lunr. For now the right mitigation is to avoid using the "/" character in your field names.

olivernn commented 6 years ago

So, I tried and failed to find a reasonable way to handle fields containing "/". I could get something working but not without an unreasonable impact on performance for all searches, even those against indexes that do not contain fields with a slash in them.

Instead I'm going to take the easy way out and add a check at build time for fields containing unsupported characters and throw an error. This will either land in a patch release or the next minor release.

Thanks again for reporting this and sorry if you lost time trying to figure out what was going on!

miraks31 commented 6 years ago

Hi Oliver,

I changed my code to replace / by _. This works perfectly ! This a good idea to check at build time that fields names are correct. Thank you again for this great tool.

olivernn commented 6 years ago

The latest release will throw an error if you try and use a character in a field name.