openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.08k stars 421 forks source link

UK County's being returned in Postcode Field #673

Open philhutch50 opened 1 month ago

philhutch50 commented 1 month ago

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is United Kingdom


Here's how I'm using libpostal

I use it to build keys for our internal address matching softwar


Here's what I did

16 Southfield Drive Barton Seagrave Northamptonshire NN15 5YQ

}

Here's what I got

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "house": "northamptonshire nn15 5yq"

So I tried adding United Kingdom

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "postcode": "northamptonshire nn15 5yq", "country": "united kingdom" }


Here's what I was expecting

Northamptonshire, Buckinghamshire, Gloucestershire are not being correctly recognised within an address and the postcode field is not always returning a correct postcode.


For parsing issues, please answer "yes" or "no" to all that apply.


Here's what I think could be improved

UK Counties not being recognised correctly

brianmacy commented 1 month ago

Have you try ed the Senzing data model? It is built on much more up to date UK data.

On Mon, Oct 14, 2024 at 14:16 Phil Hutchinson @.***> wrote:

Hi!

I was checking out libpostal, and saw something that could be improved.

My country is United Kingdom

Here's how I'm using libpostal

I use it to build keys for our internal address matching softwar

Here's what I did

16 Southfield Drive Barton Seagrave Northamptonshire NN15 5YQ https://www.google.com/maps/search/16+Southfield+Drive+Barton+Seagrave+Northamptonshire+NN15+5YQ?entry=gmail&source=g

} Here's what I got

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "house": "northamptonshire nn15 5yq"

So I tried adding United Kingdom

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "postcode": "northamptonshire nn15 5yq", "country": "united kingdom" }

Here's what I was expecting

Northamptonshire, Buckinghamshire, Gloucestershire are not being correctly recognised within an address and the postcode field is not always returning a correct postcode.

For parsing issues, please answer "yes" or "no" to all that apply.

  • Does the input address exist in OpenStreetMap https://openstreetmap.org? Yes https://www.openstreetmap.org/#map=19/52.373206/-0.694982
  • Do all the toponyms exist in OSM (city, state, region names, etc.)? Yes
  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result? No
  • If the address does not contain city, region, etc., does adding those fields to the input improve the result?
  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?

Here's what I think could be improved

UK Counties not being recognised correctly

— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6OZVD4VOI3USI5XKQUFN3Z3QC7NAVCNFSM6AAAAABP5QWLUCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DMNZTGU2TMNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

philhutch50 commented 1 month ago

I think I'm using the senzing data at the moment

On Mon, 14 Oct 2024, 19:20 brianmacy, @.***> wrote:

Have you try ed the Senzing data model? It is built on much more up to date UK data.

On Mon, Oct 14, 2024 at 14:16 Phil Hutchinson @.***> wrote:

Hi!

I was checking out libpostal, and saw something that could be improved.

My country is United Kingdom

Here's how I'm using libpostal

I use it to build keys for our internal address matching softwar

Here's what I did

16 Southfield Drive Barton Seagrave Northamptonshire NN15 5YQ https://www.google.com/maps/search/16+Southfield+Drive+Barton+Seagrave+Northamptonshire+NN15+5YQ?entry=gmail&source=g < https://www.google.com/maps/search/16+Southfield+Drive+Barton+Seagrave+Northamptonshire+NN15+5YQ?entry=gmail&source=g>

} Here's what I got

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "house": "northamptonshire nn15 5yq"

So I tried adding United Kingdom

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "postcode": "northamptonshire nn15 5yq", "country": "united kingdom" }

Here's what I was expecting

Northamptonshire, Buckinghamshire, Gloucestershire are not being correctly recognised within an address and the postcode field is not always returning a correct postcode.

For parsing issues, please answer "yes" or "no" to all that apply.

  • Does the input address exist in OpenStreetMap https://openstreetmap.org? Yes https://www.openstreetmap.org/#map=19/52.373206/-0.694982
  • Do all the toponyms exist in OSM (city, state, region names, etc.)? Yes
  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result? No
  • If the address does not contain city, region, etc., does adding those fields to the input improve the result?
  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?

Here's what I think could be improved

UK Counties not being recognised correctly

— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/673, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AF6OZVD4VOI3USI5XKQUFN3Z3QC7NAVCNFSM6AAAAABP5QWLUCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DMNZTGU2TMNY>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/673#issuecomment-2411949877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKN7ZMD2D5AOZN5G6JK6HTDZ3QDPHAVCNFSM6AAAAABP5QWLUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRHE2DSOBXG4 . You are receiving this because you authored the thread.Message ID: @.***>

brianmacy commented 1 month ago

If so, you should submit an issue at: https://github.com/Senzing/libpostal-data/issues

On Mon, Oct 14, 2024 at 3:00 PM Phil Hutchinson @.***> wrote:

I think I'm using the senzing data at the moment

On Mon, 14 Oct 2024, 19:20 brianmacy, @.***> wrote:

Have you try ed the Senzing data model? It is built on much more up to date UK data.

On Mon, Oct 14, 2024 at 14:16 Phil Hutchinson @.***> wrote:

Hi!

I was checking out libpostal, and saw something that could be improved.

My country is United Kingdom

Here's how I'm using libpostal

I use it to build keys for our internal address matching softwar

Here's what I did

16 Southfield Drive Barton Seagrave Northamptonshire NN15 5YQ < https://www.google.com/maps/search/16+Southfield+Drive+Barton+Seagrave+Northamptonshire+NN15+5YQ?entry=gmail&source=g>

<

https://www.google.com/maps/search/16+Southfield+Drive+Barton+Seagrave+Northamptonshire+NN15+5YQ?entry=gmail&source=g>

} Here's what I got

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "house": "northamptonshire nn15 5yq"

So I tried adding United Kingdom

Result:

{ "house_number": "16", "road": "southfield drive", "city": "barton seagrave", "postcode": "northamptonshire nn15 5yq", "country": "united kingdom" }

Here's what I was expecting

Northamptonshire, Buckinghamshire, Gloucestershire are not being correctly recognised within an address and the postcode field is not always returning a correct postcode.

For parsing issues, please answer "yes" or "no" to all that apply.

  • Does the input address exist in OpenStreetMap https://openstreetmap.org? Yes https://www.openstreetmap.org/#map=19/52.373206/-0.694982
  • Do all the toponyms exist in OSM (city, state, region names, etc.)? Yes
  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result? No
  • If the address does not contain city, region, etc., does adding those fields to the input improve the result?
  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?

Here's what I think could be improved

UK Counties not being recognised correctly

— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/673, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AF6OZVD4VOI3USI5XKQUFN3Z3QC7NAVCNFSM6AAAAABP5QWLUCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DMNZTGU2TMNY>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/openvenues/libpostal/issues/673#issuecomment-2411949877>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AKN7ZMD2D5AOZN5G6JK6HTDZ3QDPHAVCNFSM6AAAAABP5QWLUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRHE2DSOBXG4>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/openvenues/libpostal/issues/673#issuecomment-2412013103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF6OZVGSZQFYQSHJ4U4MUULZ3QIDJAVCNFSM6AAAAABP5QWLUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJSGAYTGMJQGM . You are receiving this because you commented.Message ID: @.***>