openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.06k stars 417 forks source link

parse address splits correct post code into house_number #510

Open Aknilam opened 4 years ago

Aknilam commented 4 years ago

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

DE


Here's how I'm using libpostal

rest api


Here's what I did

query=Wendenstr. 414-424 20537 Hamburg


Here's what I got

  {
    "label": "road",
    "value": "wendenstr. 414-424 2053"
  },
  {
    "label": "house_number",
    "value": "7"
  },
  {
    "label": "city",
    "value": "hamburg"
  }
]

Here's what I was expecting

  {
    "label": "road",
    "value": "wendenstr."
  },
  {
    "label": "house_number",
    "value": "414-424"
  },
  {
    "label": "postcode",
    "value": "20537"
  },
  {
    "label": "city",
    "value": "hamburg"
  }
]

For parsing issues, please answer "yes" or "no" to all that apply.


Here's what I think could be improved

Fix for incorrect parsing/splitting of the address - somehow postcode is split and its last number is extracted as a house_number.

tobwen commented 3 years ago

libpostal seems to be trained against a schema like street name house number, postcode city-district for the German data, where "street name" isn't allowed to contain a number at the beginning, etc. Parsing works with Wendenstr. 414-424, 20537 Hamburg - I've just added the missing comma.

Just my 2 cent: I've stopped using libpostal, since the input data needs to be too clean.

ChargedMonk commented 2 years ago

Just my 2 cent: I've stopped using _libpostal, since the input data needs to be too clean.

@tobwen, Any good working alternative?