Open clkao opened 10 years ago
Ignoring the number sign should be fairly easy. However, there is the much deeper issue of tokenizing addresses correctly that are written in scripts that do not use spaces. Your example gets split like that: 研-究-院-路-二-段12-巷14-弄4-號 (transliterated: yan jiu yuan lu er duan12 xiang14 nong4 hao) If I understand you correctly, then that should be rather: 研-究-院-路-二-段-12巷-14弄-4號 or to be sure: 研-究-院-路-二-段-12-巷-14-弄-4號
Yes, and since addresses in Taiwan are hierarchical, it's actually:
Each are currently unique street name in OSM, but ideally if the Alley can't be found, we should be able to traverse upward to the lane, or the road.
Another special case is hyphened house number, like 2-1, can sometimes be written as "2之1" or "2之1號"
4 研究院路二段12巷14弄 yields the correct search result
However the address in TW is usually written as 研究院路二段12巷14弄4號 where the house number is at the end and with the "號"(number) character. the query parser should probably recognize alternative form besides
<housenumber> <streetname>