openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.04k stars 417 forks source link

Italian feedback #664

Open opk12 opened 3 months ago

opk12 commented 3 months ago

Hi!

The OpenStreetMap Italia community was checking out libpostal, and saw something that could be improved.


My country is

Italy


Here's how I'm using libpostal

libpostal was recently discussed in the OpenStreetMap Italia chatroom. The community is not familiar with the libpostal internals, but here is some feedback.


Here's what I did

./libpostal "via rialto n.10, 33100 udine"


Here's what I got

via rialto n 10 33100 udine
via rialto numero 10 33100 udine
via rialto nord 10 33100 udine
vicenza rialto n 10 33100 udine
vicenza rialto numero 10 33100 udine
vicenza rialto nord 10 33100 udine

Here's what I was expecting

  1. n 10 is unexpected, it is not expanded to numero 10.
  2. via is so common and so different from vicenza that the expansion did not feel plausible to the community. I can see how vicenza may come from toponyms.txt containing vicenza|vi, and via theoretically being a typo of vi. As a native speaker, it feels quite stretched of a reasoning. To be fair, vicenza is always being ordered below via in the output, so I'm not sure if anything should change here.