openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.08k stars 421 forks source link

UK County abbreviations missing #418

Open philhutch50 opened 5 years ago

philhutch50 commented 5 years ago

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is United Kingdom

Here's how I'm using libpostal

I am using libpostal to improve my current data wrangling software.


Here's what I did

10 Lewis Road Kettering Northants NN15 6HE


Here's what I got

{10 House Number}{lewis rd road}{kettering city}{northants nn15 6he house}


Here's what I was expecting

{10 House Number}{lewis rd road}{kettering city}{northants state}{nn15 6he postcode}

If you add UK to the address you get

{Kettering City}{Northants house}{nn15 6he postcode}


For parsing issues, please answer "yes" or "no" to all that apply.


Here's what I think could be improved

Northants is common UK abbreviation of Northamptonshire

(see https://www.familysearch.org/wiki/en/UK_County_Abbreviations)

Is there a way I can add these State abbreviations in like you do for US states?

Many thanks Phil

antimirov commented 5 years ago

Hi. Do you mean something like this file? https://github.com/openvenues/libpostal/blob/master/resources/dictionaries/en/toponyms.txt

philhutch50 commented 5 years ago

I found the USA abbreviations in https://github.com/openvenues/libpostal/tree/master/resources/states

If I build a UK abbreviation list, I am not sure how I then get libpostal to use it as this seems the obvious way to do it as

Northants | Northamptonshire Lancs | Lancashire Notts | Nottinghamshire

etc

Could be added I believe @albarrentine is this right? And if so how do I update libpostal to use a gb states list - Thanks!

antimirov commented 5 years ago

adding this info is only the first step. Then, most of the time, we'll have to wait until Al runs the pipeline. I've never tried it myself, but I remember it takes 2 weeks and some significant RAM and storage resources.

baerbock commented 5 years ago

@philhutch50 May you please publish your full abbreviation file here?

philhutch50 commented 5 years ago

@baerbock I don't really have a full list I started and stopped as unless there is an easy way to import these in and I have had no reply from the man himself yet you will need to code new routines to sort? unless I have missed something?

syserr0r commented 2 years ago

I ran into the same issue.

Some starting places might be:

Would producing a list of counties and their possible abbreviations help?