pelias / interpolation

global street address interpolation service (beta)
https://interpolation.demo.geocode.earth
MIT License
58 stars 29 forks source link

street.db: Can we add un-normalized / not lowercased street names? #223

Closed arne-cl closed 4 years ago

arne-cl commented 4 years ago

When experimenting with /street/near/ queries for our use case (reverse geocoding on highways), I found that all resulting street names are lower-cased.

This can easily be fixed (like this), but I was wondering whether this would negatively impact other use cases of the interpolation service.

missinglink commented 4 years ago

Hi Arne,

I don't think it would have any negative impact.

The idea here is that the street name is put through the same normalization algorithm at both index-time and query-time (which in this case is libpostal).

The nice thing about that approach is that the input/database tokens are 'symmetrical' and so match much more easily than say, matching Foo Straße and Foostr. which haven't been normalized.

Part of that normalization is lowercasing as you noticed, so the information about the original form such as casing, expansion and composition are lost.

I think adding the original form of the street name in the first position is a reasonable solution, although I don't think we'll be able to merge it to master because it'll increase the index size for a feature which 'Pelias proper' doesn't require.

One other solution you could consider is adding a new column to the table (or a new table) which contains the additonal information you need, if you do that you can likely create a column COLLATE NOCASE in order to allow case-insensitive string matching.

missinglink commented 4 years ago

Fo your specific use-case I'd suggest building a copy of the Valhalla routing engine and checking out the reverse search capabilities of that. For your use-case, it may be preferred over using a geocoder.

missinglink commented 4 years ago

It's super old now but here's an example of the data which Valhalla returned for a reverse search 3 years ago: https://gist.github.com/missinglink/b2ac67f51d132b591868a9ef60061c43

arne-cl commented 4 years ago

Dear Peter, thanks a lot for pointing me towards Valhalla! We might need to use it anyway for map-matching incoming GPS positions.

I'll close the issue, as inflating the size of the database for a single use case doesn't make sense. When I find a good solution, I'll comment here again for future reference.