weixsong / elasticlunr.js

Based on lunr.js, but more flexible and customized.
http://elasticlunr.com
MIT License
2.03k stars 148 forks source link

Does not match on expected terms #45

Open aaroncraig10e opened 7 years ago

aaroncraig10e commented 7 years ago

Certain terms do not produce matches as expected.

For instance, given the following docs:

    ;([{
      id: 'a',
      title: 'Mr. Green kills Colonel Mustard',
      body: 'Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.',
      wordCount: 19
    },{
      id: 'b',
      title: 'Plumb waters green plant ',
      body: 'Professor Plumb has a green plant in his study',
      wordCount: 9
    },{
      id: 'c',
      title: 'Scarlett helps Professor',
      body: 'Miss Scarlett watered Professor Plumbs green plant while he was away from his office last week.',
      wordCount: 16
    },{
      id: 'd',
      title: 'title',
      body: 'handsome',
    },{
      id: 'e',
      title: 'title abc',
      body: 'hand',
    }]).forEach(function (doc) { idx.addDoc(doc); });

a search on the term candlestick does not produce a hit.

I'm guessing this has to do with the stemmer, as in Elasticsearch I've had mixed results using the Porter stemmer.

We are using this library now for a new project, so I am happy to work on a fix to this and send a PR. Just wanted to post the issue here for anyone else having the same issue.

aaroncraig10e commented 7 years ago

I've discovered that the problem is in the tokenizer, which does not strip punctuation. Working on a fix now.

webOS101 commented 7 years ago

@aaroncraig10e Did you land a fix for this?

aaroncraig10e commented 7 years ago

I ended up making my own package, as there were some other issues that were difficult to fix without changing the interface.

https://github.com/10eTechnology/esjs

samdutton commented 5 years ago

A propos of this — is there a way to get Elasticlunr to handle punctuation, i.e. not strip out all characters?

For example, I've built a Shakespeare search app at shearch.me, but it doesn't cope with queries such as call'd (which are common in texts of this kind).