Closed missinglink closed 4 years ago
I've added https://github.com/pelias/schema/pull/457/commits/1baeadfe9529e8ede8207c8a688ee5459efe3c48 to address an issue where the linter is was using /\s/
instead of [\\s/\\\\-]+
to determine which tokens were multi-word.
As a result I've had to remove a few hyphenated synonyms from the canonical synonym lists.
Nice. Does it make sense to add an integration test for multi-word synonyms?
As discussed in https://github.com/pelias/schema/issues/456, the work in https://github.com/pelias/schema/pull/453 had the unexpected consequences of dropping support for multi-word custom synonyms.
My general guidance here is that multi-word synonyms are poorly supported by lucene/elasticsearch and so should be avoided where possible, great care should be taken to ensure they are compatible with the
match_phrase
queries used by Pelias.Where possible I'd recommend using 'aliases' ie.
doc.setNameAlias()
instead, this is a reliable method of achieving the same thing, although it's far less convenient because it's on a per-record basis.So.. having said all that.. this PR re-enables support for multi-word synonyms (in custom synonyms files only) in order to avoid breaking backwards compatibility.
In order to do this I had to move the custom multi-word synonyms outside the multiplexer, apparently multiplexers emit tokens one-by-one to each of their branches, preventing the ability to 'look-ahead' as required by any multi-term analysis within a branch.
resolves https://github.com/pelias/schema/issues/456