openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4k stars 415 forks source link

Shire abbreviation in the UK #412

Open datamacgyver opened 5 years ago

datamacgyver commented 5 years ago

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

UK


Here's how I'm using libpostal

Currently just evaluating it as a tool we may want to use in a project


Here's what I did

Fed it the following (fake) address:

Home Farm Fen Lane Derbys DE5 2AO


Here's what I got

[('home farm', 'house'), ('fen lane derbys', 'road'), ('de5 2ao', 'postcode')]


Here's what I was expecting

[('home farm', 'house'), ('fen lane', 'road'), ('derbyshire', 'state_district'), ('de5 2ao', 'postcode')]


For parsing issues, please answer "yes" or "no" to all that apply.


Here's what I think could be improved

In the UK, many of our counties end in -shire (Derbyshire, Nottinghamshire, Yorkshire). This is owing to the fact that historically shires were a key administrative division. As it's quite common, popular parlance is to abbreviate many counties ending in shire to XXXs. For example Derbys (Derbyshire), Notts (Nottinghamshire), Yorks (Yorkshire). Detection of this would be important for any UK based user.

syserr0r commented 2 years ago

I think #418 better covers this, as in your example you abbreviated Nottinghamshire to Notts, so it isn't just replacing shire -> s