pelias / wof-admin-lookup

Who's on First Admin Lookup for the Pelias Geocoder
https://pelias.io
MIT License
9 stars 24 forks source link

Enable postalcode lookup #310

Open octmoraru opened 2 years ago

octmoraru commented 2 years ago

:wave: I did some work for the Pelias project and would love for everyone to have a look at it and provide feedback.


Here's the reason for this change :rocket:

Currently wof-admin-lookup does not support lookups in the postalcode layer.


Here's what actually got changed :clap:

This diff enables lookup in the postalcode layer, guarded by a config flag. Because the postal code data can amount to a significant size (~12.6 GB for the global postalcode dataset as of today), this setting is disabled by default.

The new lookupPostalCode setting affects only the "local resolver" mode. If it's set to true, it will start the postalcode layer worker. The layer is considered "untrusted", just like the neighbourhood layer.


Here's how others can test the changes :eyes:

I've tested this diff locally by using pip-service:

On my machine, the entire global postal code dataset was loaded in ~6 minutes:

2022-02-08T11:23:05.820Z - info: [wof-pip-service:master] postalcode worker loaded 421118 features in 370.922 seconds
2022-02-08T11:23:06.440Z - info: [wof-pip-service:master] PIP Service Loading Completed!!!
missinglink commented 2 years ago

Worth mentioning that there are two 'postcode' properties in Pelias, the most common being an 'address property', the other is in the admin hierarchy which allows records to be parented by a postcode but has never really been used.

missinglink commented 2 years ago

For that reason it might be required to 'special-case' postcode such that the hierarchy label is duplicated in the address properties. IIRC all the search queries target address_parts.zip.

orangejulius commented 2 years ago

Nice, this feature has been quite in-demand over the years. Postal codes can be very tricky for admin lookup, which is part of why we've held off, but I don't think it hurts anything to merge it with the default set to disabled.

Some ideas of future work that we might want (they should all come with further discussion):

orangejulius commented 2 years ago

@octmoraru one question, when you tested this, did the PIP service downloader give you any issues? As I recall it's set to default to admin only (it won't download the postalcode databases):

https://github.com/pelias/pip-service/blob/f89c0f1e0796a388f1000629ff8113ddbc2db295/bin/download#L3

octmoraru commented 2 years ago

Thanks @missinglink and @orangejulius for your quick reactions!

For that reason it might be required to 'special-case' postcode such that the hierarchy label is duplicated in the address properties. IIRC all the search queries target address_parts.zip.

You mean doing that in the importers, right? Indeed, that would be quite neat as it has the potential to fill some gaps in source dataset. However, even with the current implementation, not all is lost. If I read this code correctly, at runtime, for most (all?) endpoints parent.postalcode will be renamed to postalcode, as will address_parts.zip, with the latter being preferred, if present. So, even without any changes to the importers, WOF postal code information will be visible in the results, for those entries that lacked this info in the source dataset. Please correct me if I'm wrong :)

Enabling postalcode lookup on a per-country basis (it makes more sense in some places than others)

Yeah, that sounds reasonable. I guess that would be quite easy to implement by adding a new setting and changing the logic in src/pip/readStream.js

I'll just link some relevant issues too, since we've discussed a lot of this before:

Thanks for sharing, it didn't cross my mind to search in pelias/pelias 🤦

@octmoraru one question, when you tested this, did the PIP service downloader give you any issues? As I recall it's set to default to admin only (it won't download the postalcode databases):

I guess I've been using the whosonfirst downloader directly which by default downloads everything 😀 https://github.com/pelias/whosonfirst/blob/22f7ad76dbf65bf25f0f4a2af352ad45b72a2a94/bin/download

octmoraru commented 2 years ago

Hi @orangejulius - what else would you like to see in this PR to have it merged? 🙂

missinglink commented 2 years ago

Have you tested that this works as expected? I believe that this alone will not fix any issues in Pelias results since the PIP service is responsible for setting these fields but not this one, which is used for search and display.

orangejulius commented 2 years ago

Ah, yeah so as @missinglink points out there are a couple objectives that one might wish to achieve:

Adding postal codes to address or other records

Having postal code geometries available via the PIP service is a prerequisite for this, but not enough to solve this issue on its own. Additional work would be required to make sure that, at import time, postal codes are added to the appropriate fields on each individual address record.

Allowing coarse reverse geocoding to return postalcodes

This would let one look up the postalcode for a given lat/lon, assuming the postalcode geometry is trustworthy. It sounds like @octmoraru says this works after this PR.

In my estimation, that's the less common of the two needs, but it's better than nothing. 12.6GB of RAM is quite a bit, so it might not be a good idea to enable this feature in many cases. I'm on the fence if supporting something that will be a non-recommended configuration is worth it.

octmoraru commented 2 years ago

Indeed, my primary intent for this PR was to make postalcodes available in /v1/reverse?layers=coarse(or ?layers=postalcode) responses. I realise now that I need to change this array to include postalcode in order for it to actually work.

Additional work would be required to make sure that, at import time, postal codes are added to the appropriate fields on each individual address record.

True, @orangejulius @missinglink - how would you feel if I were to duplicate parent.postalcode into address_parts.zip, somewhere around here? I think that should do the trick for all importers, right?

missinglink commented 2 years ago

I feel like the topic of whether algorithmically assigned postcodes are considered equal to authoritative postcodes is subjective, regional and domain-specific.

Some installations may prefer the existing behavior where we only display authoritative postcodes, some may prefer to have increased coverage at the cost of the potential inaccuracies.

For that reason I would like to put it behind a pelias/config 'flag' which your org can opt-in to immediately but others can elect to opt in or not, and punt the decision of which is the default for later, once we've evaluated the international quality implications.

Another option which we've discussed in the past is to introduce a new field called postcode_quality (or whatever we name it) which indicates that these fields are populated algorithmically from polygons rather than supplied in the source data.

This second option would allow us to make this the default immediately but indicate to the consumer a criteria which they could filter on after parsing the geojson result.

octmoraru commented 2 years ago

Thanks for making it clearer for me! So out of these two options:

For that reason I would like to put it behind a pelias/config 'flag' which your org can opt-in to immediately but others can elect to opt in or not, and punt the decision of which is the default for later, once we've evaluated the international quality implications.

and

Another option which we've discussed in the past is to introduce a new field called postcode_quality (or whatever we name it) which indicates that these fields are populated algorithmically from polygons rather than supplied in the source data.

which one would you prefer? I can probably work at implementing either one of them in the near future.

missinglink commented 2 years ago

I think the two solutions aren't really different, more that the latter is an extension of the former.

So at minimum:

  1. User can choose whether postcodes are downloaded and ingested into the PIP server.
  2. A config option is added to enable/disable setting address_parts.zip from PIP results (obviously this would be a no-op if 1. didn't import anything), the default value is false.

In this situation you & others can elect to opt-in to this feature in their installation of Pelias, for testing, production, or whatever by setting the value to true.

Then we live with it for a few months, hopefully you provide some feedback about how well it worked out for you and we evaluate how good/bad it is.

Then later this year we approach the idea of making it the default setting.

  1. At this stage the default would change from false to true and at the same time we
  2. introduce something in the response which indicates whether the postcode was authoritative or algorithmically assigned.

I think you should focus on the first two steps since it solves your business need and also is easy to merge since it will be a no-op for everyone else (for now).