pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 27 forks source link

Special handling of streets with no suffix #140

Closed missinglink closed 3 years ago

missinglink commented 3 years ago

Some street names consist of a single word without a street suffix. A well known example of this is Broadway_(Manhattan).

The parser currently doesn't parse addresses on these streets very well:

node bin/cli.js 24 broadway
...
(0.86) ➜ [ { street: '24 broadway' } ]

We are interpreting the input as a numeric street (minus the ordinal suffix), the following being a correct parse:

node bin/cli.js 24 street
...
(0.86) ➜ [ { street: '24 street' } ]

Interestingly, we have Broadway listed as a street suffix, although I'm not familiar with anywhere in the world which this is common, the USPS doesn't list it as a common street suffix in the USA.

So removing that suffix may help resolve this issue to some degree

Another similar street name I can think of is "Esplanade", which we may be able to handle similarly.

I think in absence of a suffix it might still be difficult to classify these strings as streets, since they’re just proper names with no context surrounding them, if that's the case we may need to keep a list of these proper names which are common street names in their own right such as "Broadway".

Some other similar cases to consider when testing this work:

see: https://onmilwaukee.com/articles/broadway