Closed joshdhenry closed 5 years ago
Hi @joshdhenry!
Judging from a cursory test using Python's NLTK framework, it looks like this is just the Porter stemmer in action, as you suggested:
Python 3.7.2 (default, Jan 10 2019, 23:51:51)
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.stem.porter import PorterStemmer
>>> s = PorterStemmer()
>>> s.stem('compete')
'compet'
>>> s.stem('competitive')
'competit'
To answer your question about making the search do the right thing for "compete" and "competitive", I'm afraid you'll either need to use a different stemming algorithm, or write a pipeline function that treats the two as synonyms.
Using Lunr 2.3.5. Here is a small example:
I'm confused as to why when I search 'competitive', it returns 'competitive' but it doesn't also return 'compete'. I would think 'compet' would be considered the root word of both.
Is this to be expected with the Porter stemmer algorithm? Is there any way I can make a search for 'competitive' return 2 results - 'compete' and 'competitive'. This probably occurs with many words so performing manual fixes on certain words is less desirable.