openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.05k stars 416 forks source link

Input delimiters are ignored #405

Open missinglink opened 5 years ago

missinglink commented 5 years ago

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

N/A


Here's how I'm using libpostal

Pelias


Here's what I did

> Front Street, Hamilton Island, QLD

Here's what I got

{
  "road": "front street hamilton",
  "suburb": "island",
  "state": "qld"
}

Here's what I was expecting

I was expecting libpostal to interpret the commas as token boundaries instead of conflating "Front Street" with "Hamilton" (from different input groups).


For parsing issues, please answer "yes" or "no" to all that apply.


Here's what I think could be improved

libpostal can take advantage of human-entered delimiters (such as commas) to determine token boundaries.

missinglink commented 5 years ago

Another example:

4015 N ALBINA ave, portland

[
  {
    "label": "house_number",
    "value": "4015"
  },
  {
    "label": "road",
    "value": "n albina ave portland"
  }
]
antimirov commented 5 years ago

Unfortunately, it's a documented "feature" of libpostal - commas are ignored indeed. This isn't gonna be changed most likely, there have been multiple requests about it.

vlasvlasvlas commented 5 years ago

comma delimiter may help, a lot!