osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.09k stars 712 forks source link

Parse localized housenumber #177

Open clkao opened 10 years ago

clkao commented 10 years ago

4 研究院路二段12巷14弄 yields the correct search result

However the address in TW is usually written as 研究院路二段12巷14弄4號 where the house number is at the end and with the "號"(number) character. the query parser should probably recognize alternative form besides <housenumber> <streetname>

lonvia commented 10 years ago

Ignoring the number sign should be fairly easy. However, there is the much deeper issue of tokenizing addresses correctly that are written in scripts that do not use spaces. Your example gets split like that: 研-究-院-路-二-段12-巷14-弄4-號 (transliterated: yan jiu yuan lu er duan12 xiang14 nong4 hao) If I understand you correctly, then that should be rather: 研-究-院-路-二-段-12巷-14弄-4號 or to be sure: 研-究-院-路-二-段-12-巷-14-弄-4號

clkao commented 10 years ago

Yes, and since addresses in Taiwan are hierarchical, it's actually:

Each are currently unique street name in OSM, but ideally if the Alley can't be found, we should be able to traverse upward to the lane, or the road.

clkao commented 10 years ago

Another special case is hyphened house number, like 2-1, can sometimes be written as "2之1" or "2之1號"