pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

street_name_normalization: contract english diagonals #486

Closed missinglink closed 3 years ago

missinglink commented 3 years ago

as a follow on from https://github.com/pelias/openaddresses/pull/477 and pairing with https://github.com/pelias/openstreetmap/pull/560 this PR contracts English diagonals in street names.

it only targets four terms ["southeast", "southwest", "northeast", "northwest"], when those tokens are found they are replaced with the contracted form [SE, SW, NE, NW].

as such it's pretty safe

orangejulius commented 3 years ago

Nice, this makes sense. Definitely good for consistency.

I recall from our investigation into abbreviations a few months back that there aren't too many cases of expanded directionals in OA data, meaning this won't have too much effect, right? But the effect it does have is good as it will make sure all the directionals are the same across OA and OSM.

orangejulius commented 3 years ago

Oh, also, this replaces https://github.com/pelias/openaddresses/pull/479, right?

missinglink commented 3 years ago

Oh funny, I totally forgot I already did something similar in https://github.com/pelias/openaddresses/pull/479

Looking at that PR it differs slightly in that:

I think in both cases there is a risk of introducing some error, things like 'N Street' and 'S Road' come time mind, and any usage of se which doesn't mean southeast.

I think this PR is a lot safer and probably why the other one got a bit stuck getting merged, I'll close it.

Nice, this makes sense. Definitely good for consistency.

Yeah, that's the main value of this, so we handle diagonal contractions the same in OA and OSM