openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.05k stars 416 forks source link

Incorrect parsed result in HaNoi Vietnam #333

Open thucnc opened 6 years ago

thucnc commented 6 years ago

Hi!

I using pelias libpostal docker image and got issue in parsing address of Ha Noi (Vietnam). In details, the parsed component is not correct, it tends to combine the street and district into street component. For example, address "169 Nguyễn Ngọc Vũ, Cau Giay, Ha Noi" is parsed into

"parsed_text": {
                "number": "169",
                "street": "nguyễn ngọc vũ cau giay",
                "city": "ha noi"
            }

where Cau Giay is a district of Ha Noi city.

You can double check this issue at page http://pelias.github.io/compare/#/v1/search%3Ftext=169%20Nguy%E1%BB%85n%20Ng%E1%BB%8Dc%20V%C5%A9,%20Cau%20Giay,%20Ha%20Noi

On the other hand, libpostal works well with address of Ho Chi Minh city.

albarrentine commented 6 years ago

Hm, it looks like that's mapped in a few different ways in OSM, which might create some confusion for the parser.

There's a suburb node, although it looks like it was added more recently than the last training set was created, and then there are a few addresses that have it as addr:district, which would map to state_district in our nomenclature, so will need to audit the use of that tag for the next training build and make sure those all map to city_district. Is this Wikipedia article a decent list of all the places that should be city districts? https://en.wikipedia.org/wiki/List_of_urban_districts_of_Vietnam

thucnc commented 6 years ago

@albarrentine I can provide all district names of Ha Noi, if you need.

thucnc commented 6 years ago

@albarrentine I checked this list on https://en.wikipedia.org/wiki/List_of_urban_districts_of_Vietnam, and it is correct for HaNoi and HoChiMinh so you can use it. Please note that the list uses English names (such as Tu Liem South, Tu Liem North) while their Vietnamese names are quite different (such as Nam Tu Liem, Bac Tu Liem)