Closed missinglink closed 2 years ago
Cool, from the test output this looks perfect. Next step is probably to run a small (or maybe planet build) and see how this compares in some of our relevant test cases. The new one from https://github.com/pelias/api/issues/1594 should be especially helpful.
opening this up for review/merge.
there were a couple things I wasn't quite happy about but decided not to tackle in this PR:
test/post/seperable_street_names.js
I have enumerated all the common permutations of a german street name, two of these forms are not covered by this functionality due to a lack of full-token synonym substitution. I went back and forth on this and decided that since we already have the pelias/schema
synonyms and the index-time street name cleanup script that it would be messy and error-prone to duplicate that functionality here. I added a code comment with a little more info.expansions/contractions
(which I probably named in the first place 😝), these words make me think of conversion to/from abbreviations, whereas the functionality here is about separating/combining of compound words. I would be :+1: for finding better names although I doubt anyone except for me really cares about this 😆 worth noting that more computation will be required than before since we are operating on lists now rather than scalar values, I think this is preferable in order to produce more permutations, but it may result in a very slight index-time perf slowdown for affected records (ie. DE/CH/AT/NL
plus address/street/intersection
).
@orangejulius before merging this we might want to quickly discuss the config and consider expanding it a little more (or not?), it's no extra 'work' per-se, more of a question of completeness/coverage.
I didn't add an appropriate semantic-release message to https://github.com/pelias/model/pull/147 so it will be included in this release.
This looks good on the dev
environment:
Happy to merge this as-is, there are a couple more potential separable street suffixes in https://github.com/openvenues/libpostal/blob/master/resources/dictionaries/de/concatenated_suffixes_separable.txt but I don't feel like we need to add any of those at this stage since they're less common.
need to add acceptance-tests
implementation of https://github.com/pelias/model/issues/144
this method is slightly different from previous in that all versions of the name (including aliases) are considered rather than only the primary name.
note: this method produces duplicate names, the subsequent post processing step 'deduplication' handles removing duplicate entries so I felt it wasn't necessary to do so within this script, at the cost of having duplicate entries in the tests.
resolves https://github.com/pelias/model/issues/144 resolves https://github.com/pelias/api/issues/1594