This change adds the removeAllZeroNumericPrefix filter to the peliasPhrase analyzer. The idea is to help ensure postalcodes with a leading zero can show up in autocomplete queries, as reported in https://github.com/pelias/pelias/issues/898.
The change is based on two assumptions:
We want the peliasPhrase, peliasQuery, and peliasIndexOneEdgeGram analyzers to all handle leading zeros similarly
We do want to remove leading zeros in our analyzers
I recall the original motivation for removing leading zeros is that we sometimes see street names like 05th avenue in various data sources (or possibly queries), and we want to allow that to match on 5th avenue.
The original code to do this seems to have been written pretty long ago so it's hard to say for sure.
Anyway, assuming we do want to handle those cases, it seems like removing leading zeros everywhere will allow us to handle postalcodes that start with zero, of course with some downsides: the leading zeros are ignored completely, so we cant distinguish between 01000 and 1000, which might be valid postalcodes or housenumbers, for example.
This can lead to cases where clearly incorrect results come up, like 1000 main street matching a request for a hypothetical 01000 postalcode. But I think it's the best we can do without a bunch more work.
I tested this code with a global set of postalcodes and it does allow the relevant postalcodes to match.
Assuming this is the best idea anyone else has we can move forward with testing this PR on a full planet build and going from there.
This change adds the
removeAllZeroNumericPrefix
filter to thepeliasPhrase
analyzer. The idea is to help ensure postalcodes with a leading zero can show up in autocomplete queries, as reported in https://github.com/pelias/pelias/issues/898.The change is based on two assumptions:
peliasPhrase
,peliasQuery
, andpeliasIndexOneEdgeGram
analyzers to all handle leading zeros similarlyI recall the original motivation for removing leading zeros is that we sometimes see street names like
05th avenue
in various data sources (or possibly queries), and we want to allow that to match on5th avenue
.The original code to do this seems to have been written pretty long ago so it's hard to say for sure.
Anyway, assuming we do want to handle those cases, it seems like removing leading zeros everywhere will allow us to handle postalcodes that start with zero, of course with some downsides: the leading zeros are ignored completely, so we cant distinguish between
01000
and1000
, which might be valid postalcodes or housenumbers, for example. This can lead to cases where clearly incorrect results come up, like1000 main street
matching a request for a hypothetical01000
postalcode. But I think it's the best we can do without a bunch more work.I tested this code with a global set of postalcodes and it does allow the relevant postalcodes to match.
Assuming this is the best idea anyone else has we can move forward with testing this PR on a full planet build and going from there.
Fixes https://github.com/pelias/pelias/issues/898