pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 74 forks source link

Consider adding apostrophe tokenfilter #434

Open missinglink opened 4 years ago

missinglink commented 4 years ago

As reported in https://github.com/pelias/pelias/issues/847, we can improve fuzzy-matching by applying an apostrophe tokenfilter.

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-apostrophe-tokenfilter.html

missinglink commented 4 years ago

We're currently removing apostrophe characters in the punctuation filter. The effect of this is to convert mcdonald's => mcdonalds.

I had a play with introducing the apostrophe tokenfilter linked above (and also removing apostrophe from the punctuation filter): The effect of this is mcdonald's => mcdonald.

What we really need is a method where all three of these compositions are considered equal:

mcdonald's
mcdonalds
mcdonald