olivernn / lunr.js

A bit like Solr, but much smaller and not as bright
http://lunrjs.com
MIT License
8.96k stars 548 forks source link

Fixed stemming words ending with letter 'y' according to Porter2 algorithm #84

Closed MihaiValentin closed 10 years ago

MihaiValentin commented 10 years ago

While searching for the word lay, I noticed that there are no results, though I had a lot of layout words in my documents.

When debugging, I noticed that lay was transformed into lai, due to step 1c from the Porter2 algorithm.

When reviewing step 1c, I noticed that we should only convert the y to i in words containing non-vowels before y, so lay should have stayed lay after this step.

This PR contains the correct implementation of step 1c. I've also added two test fixtures.

jure commented 10 years ago

:+1:

https://github.com/mdirolf/pyporter2/blob/master/Stemmer.py#L135-L140

https://github.com/nemec/porter2-stemmer/blob/master/Porter2Stemmer.UnitTest/EnglishPorter2StemmerUnitTest.cs#L749-L790

https://github.com/kristopolous/Porter2-Stemmer/blob/master/validate_PorterStemmer2/ReferenceTable.js#L26865

https://github.com/kristopolous/Porter2-Stemmer/blob/master/validate_PorterStemmer2/ReferenceTable.js#L14943

olivernn commented 10 years ago

Good spot, thanks for providing a fix!

Before I merge this please could you remove the build artefacts from the commit (lunr.js and lunr.min.js).

MihaiValentin commented 10 years ago

I removed the build artifacts.

olivernn commented 10 years ago

I've just pushed 0.5.3 which includes your fix for this bug, thanks!