I was checking out libpostal, and have a question: I read that this library supports Japanese addresses parsing, however, when I tried, it doesn't seem working well. So I would like to get some feedback from the awesome contributors (tried for other countries, and it works really great!)
My country is
US, but I'm using it for parsing Japanese addresses
Here's how I'm using libpostal
To help extract address from small business owner's website
Here's what I did
text = '〒100-8994 東京都千代田区丸ノ内2-7-2'
parse_address(text)
Do all the toponyms exist in OSM (city, state, region names, etc.)?
If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?
If the address does not contain city, region, etc., does adding those fields to the input improve the result?
If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?
Hi!
I was checking out libpostal, and have a question: I read that this library supports Japanese addresses parsing, however, when I tried, it doesn't seem working well. So I would like to get some feedback from the awesome contributors (tried for other countries, and it works really great!)
My country is
US, but I'm using it for parsing Japanese addresses
Here's how I'm using libpostal
To help extract address from small business owner's website
Here's what I did
text = '〒100-8994 東京都千代田区丸ノ内2-7-2' parse_address(text)
Here's what I got
[('〒100-8994', 'postcode'), ('東', 'city'), ('京都千代田', 'city_district'), ('区', 'city'), ('丸ノ内', 'road'), ('2-7-2', 'house_number')]
Here's what I was expecting
postcode is correct, but "東京都" (means Tokyo Capital) is supposed to be city, "千代田区" is supposed to be city district
Here are a few other examples
Example 1 input: text = '〒550-0002 大阪府大阪市西区江戸堀1丁目18番21号' parse_address(text)
output: [('〒550-0002', 'postcode'), ('大', 'state'), ('阪', 'city'), ('府大阪市西', 'city_district'), ('区', 'city'), ('江戸堀', 'house'), ('1丁目', 'suburb'), ('18番', 'house_number'), ('21号', 'city_district')]
expected/correct parsing: 〒550-0002 大阪府 / 大阪市 / 西区 / 江戸堀 / 1丁目18番21号
Example 2 input: text = '〒064-0809 北海道札幌市中央区南9条西3丁目2−5' parse_address(text)
output: [('〒064-0809', 'postcode'), ('北', 'state'), ('海', 'city'), ('道札幌市中央区南9条西', 'road'), ('3丁目', 'suburb'), ('2-5', 'house_number')]
expected/correct parsing: 〒064-0809 北海道 / 札幌市 / 中央区 / 南9条西 / 3丁目2−5
Example 3 input: text = '〒604-8064 京都府京都市中京区骨屋之町560 離れ' parse_address(text)
output: [('〒604-8064', 'postcode'), ('京', 'state'), ('都', 'city'), ('府京都市中京区', 'city_district'), ('骨屋之町', 'road'), ('560', 'house_number'), ('離れ', 'road')]
expected/correct parsing: 〒604-8064 京都府 / 京都市 / 中京区 / 骨屋之町 / 560 離れ
Example 4 input: text = '〒460-0031 愛知県名古屋市中区本丸1−1' parse_address(text)
output: [('〒460-0031', 'postcode'), ('愛', 'state'), ('知県名古屋市中', 'city'), ('区', 'city_district'), ('本丸', 'suburb'), ('1-1', 'house_number')]
expected/correct parsing: 〒460-0031 愛知県 / 名古屋市 / 中区 / 本丸 / 1−1
For parsing issues, please answer "yes" or "no" to all that apply.
Here's what I think could be improved