pelias / model

Pelias data models
6 stars 17 forks source link

add alias for compound street names with abbreviated generic #145

Closed missinglink closed 2 years ago

missinglink commented 2 years ago

implementation of https://github.com/pelias/model/issues/144

this method is slightly different from previous in that all versions of the name (including aliases) are considered rather than only the primary name.

note: this method produces duplicate names, the subsequent post processing step 'deduplication' handles removing duplicate entries so I felt it wasn't necessary to do so within this script, at the cost of having duplicate entries in the tests.

resolves https://github.com/pelias/model/issues/144 resolves https://github.com/pelias/api/issues/1594

orangejulius commented 2 years ago

Cool, from the test output this looks perfect. Next step is probably to run a small (or maybe planet build) and see how this compares in some of our relevant test cases. The new one from https://github.com/pelias/api/issues/1594 should be especially helpful.

missinglink commented 2 years ago

opening this up for review/merge.

there were a couple things I wasn't quite happy about but decided not to tackle in this PR:

missinglink commented 2 years ago

worth noting that more computation will be required than before since we are operating on lists now rather than scalar values, I think this is preferable in order to produce more permutations, but it may result in a very slight index-time perf slowdown for affected records (ie. DE/CH/AT/NL plus address/street/intersection).

missinglink commented 2 years ago

@orangejulius before merging this we might want to quickly discuss the config and consider expanding it a little more (or not?), it's no extra 'work' per-se, more of a question of completeness/coverage.

missinglink commented 2 years ago

I didn't add an appropriate semantic-release message to https://github.com/pelias/model/pull/147 so it will be included in this release.

missinglink commented 2 years ago

This looks good on the dev environment:

Screenshot 2022-01-13 at 12 47 36

Happy to merge this as-is, there are a couple more potential separable street suffixes in https://github.com/openvenues/libpostal/blob/master/resources/dictionaries/de/concatenated_suffixes_separable.txt but I don't feel like we need to add any of those at this stage since they're less common.

missinglink commented 2 years ago

need to add acceptance-tests