pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 74 forks source link

country code synonyms #472

Closed missinglink closed 3 years ago

missinglink commented 3 years ago

branched from https://github.com/pelias/schema/pull/471 please merge that PR first, see diff

This PR:

This work solves the issue outlined in https://github.com/pelias/schema/pull/469, however it could come with some unwanted side-effects, so we should discuss them before merging..

Ideally these synonyms would only be applied to the parent.country_a field and not other peliasAdmin fields such as parent.region (for example).

In order to accomplish that we would need to do a bit of refactoring, this may be preferable to avoid synonyms like ST,STP from this new file conflicting with regions prefixed with 'Saint', for instance.

Each of the address_parts.*, name.* etc fields currently have their own analyser, but the parent.* fields all share a common analyser, it may be time to give each parent.* field it's own analyser so they can be configured independently.

One other issue I noticed when developing this is that the admin partial uses "search_analyzer": "peliasAdmin" when it should really use "search_analyzer": "peliasQuery", this means that synonyms are being applied at both query-time and search-time.

For the parent.*.ngram fields we index with "analyzer": "peliasIndexOneEdgeGram", which means that the synonyms will also be added to the name.* fields, this could be avoided if we also had individual ngram analyzers for each of these sub-fields which were different from the main peliasIndexOneEdgeGram analyzer.

Let's discuss on a call..

missinglink commented 3 years ago

note: we may want to add some non-standard synonyms such as UK,GB

orangejulius commented 3 years ago

I ran a quick WOF-only build with this PR, and can confirm that it's enough to make the city autocomplete test in https://github.com/pelias/acceptance-tests/pull/537 flip to passing

image

orangejulius commented 3 years ago

Just repeating and expanding on my comment in https://github.com/pelias/schema/pull/471#issuecomment-772656955 over here:

It seems like we want to keep the basic structure of this PR, where we use synonyms to handle either the 2 or 3 letter variants of country codes, with two changes:

Otherwise, looks like this will fix a nice case for us and shouldn't really break anything else 👍

missinglink commented 3 years ago

superseded by https://github.com/pelias/schema/pull/473