pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 75 forks source link

adding support for optionally customizing the document model from pelias.json #493

Open lokkju opened 1 year ago

lokkju commented 1 year ago

:wave: I added the ability to customize the document model from pelias.json


Here's the reason for this change :rocket:

In order to allow the return of shape data from elasticsearch when it's imported, this changeset allows overriding the document model configuration via pelias.json. Specifically, I needed to be able to override the _source.excludes array to not include shape.


Here's what actually got changed :clap:


Here's how others can test the changes :eyes:

I added an appropriate test

also, the full-stack of pelias changes and settings options to enable shape ingestion from whosonfirst, shape data return from the api, and boundary display on the map, is available at https://github.com/lokkju/pelias-us-shape-autocomplete

missinglink commented 1 year ago

Hi @lokkju thanks for the PR.

The shape field already exists to accommodate storing a geometry for display, but we never fully supported it since the center_point field is the default for display.

I'm wondering if instead of making this configurable that instead, we remove that field from the excludes list.

I think this would mean that queries will return shape: undefined for records where it isn't defined but that wouldn't be a huge performance issue and would allow us to have a single schema to maintain, which I think would be valuable since all pelias instances would have a compatible schema.

@orangejulius, @Joxit do you have opinions on removing this field from the source excludes list?

lokkju commented 1 year ago

This change allows the shape to be removed from the excludes list, without forcing the change on everyone. By default, I imagine we'd like to keep the current default behavior of not returning the shape field, since it can be extremely large.

While queries can choose to exclude fields at the query level (and my pelias/api modification do exactly that for all except the place query), i thought it'd be best to keep current behavior.

To be clear, the example configuration project I shared doesn't change the contents of the schema, it just overwrite the excludes field to only include phrase.

missinglink commented 1 year ago

I imagine we'd like to keep the current default behavior of not returning the shape field, since it can be extremely large

Are we currently importing shapes in the default setup for importers such as who's on first?

I assumed that if we didn't import any shape data then the fields would be empty and so therefore not introduce any overhead?

lokkju commented 1 year ago

We are not, at least not from WOF; my changeset makes it an option for the WOF, but does not make it the default.

My concern is for any other client tools that may be using/querying the ES instance, besides the pelias-api; if they don't exclude the shape field in their query, it could cause an unexpected impact.

If there is a consensus that the use of other clients than the API isn't a concern, I guess we could just make this change in the document as you suggest; then the shape field only will have data and a performance impact if someone chose to enable it on import.

I was trying to follow the existing practice in the schema project, as was done with the ES settings object, of allowing extensive customization from within the main config file.

Joxit commented 1 year ago

Hi there, AFAIK pelias do not import shapes / we cannot import shapes in ES. So making it by default shouldn't be a big deal for most of them. But you're right @lokkju, if someone created it's own importer that uses the shape, it may break something... (I have some custom importer for example, they do not use shape though)

So my final word is, adding a configuration for exclude may be safer. 😊

lokkju commented 1 year ago

Is there anything I can do to help encourage movement here?

lokkju commented 9 months ago

Following up again, I'd really like to get this merged

lokkju commented 2 months ago

And I'd still like to get this up-streamed. Anything I can do to move it along?