pelias / api

HTTP API for Pelias Geocoder
http://pelias.io
MIT License
221 stars 162 forks source link

Support filtering against `dependency_a` when using `boundary.country` parameter #1622

Closed orangejulius closed 2 years ago

orangejulius commented 2 years ago

The boundary.country parameter is supported on all major Pelias API endpoints and adds a hard filter based on a given 2 or 3 character country code.

The behavior is pretty straightforward, but the underlying data model for country-like parents is not quite so simple.

While most records have a parent.country_a property with the country code of a parent, a few have parent.dependency_a. This includes places like US territories outside the 50 states, the French Overseas Territories, and various others that have their own country codes but may not be a completely sovereign country of their own.

In practice however, this distinction isn't super useful to most people. See the country_code parameter we added in https://github.com/pelias/api/pull/1541 for another case where it's helpful to provide an interface that glosses over some of the details.

Functionality

This PR makes the boundary.country param look for a matching country code in either the parent.dependency_a property or parent.country_a, using an Elasticsearch multi_match query. Previously only parent.country_a was considered.

There's really nothing tricky to it, the multi_match query allows either field to contain the desired country code.

As a side note, I realized that our autocomplete queries were using the parent.country_a.ngram field due to our work in https://github.com/pelias/api/pull/1264. While this should make little practical difference it's technically not needed, as the boundary.country param doesn't need to consider partial inputs. As a side effect this PR fixes that ever so slight departure from the ideal.

A note on implementation

As it stands, this PR includes a departure from the previous implementation, which used a boundary.country specific view in the pelias-query library.

Instead, the generic multi_match view is used, and all the Pelias API specific logic is contained here. It also means there's no need to release a companion PR to pelias-query. In the past we've found the dance of developing PRs to API and pelias-query in parallel to be a bit of extra work, and this pattern would eliminate that need. I'd love to see us moving towards having all the Pelias-specific query logic here in the API, while pelias-query contains only or mostly Elasticsearch-specific stuff.

That said it does mean that the way this filter parameter works is now different from most of the others, so it's worth discussing if that's ok with us.

Still to come

The /v1/reverse endpoint technically supports boundary.country as well (though the use case is fairly limited). This will come in a subsequent PR as the "coarse" part of the reverse endpoint doesn't support boundary.country at all. It might make sense to have further discussion there, including the possibility of removing the boundary.country param from the reverse endpoint.