polm / posuto

🏣📮〠 Japanese postal code data.
MIT License
204 stars 11 forks source link

Special postal codes not handled #4

Closed kinow closed 3 years ago

kinow commented 3 years ago

Hi,

Saw that the version 0.2.0 was out and that it had migrated from JSON to Sqlite. I've never used a Python library that does that (not that I am aware) nor packaged one. So decided to try and see if that worked, if that'd be slow, etc.

Installation was super smooth :+1: no issues found.

Then decided to test with a random address. Picked Tamana/Kumamoto (my distant family hometown), then googled a random address, and found this website: https://www.town.nagasu.lg.jp/default.html

The footer of the page contains: " 〒869-0198 熊本県玉名郡長洲町大字長洲2766番地 Tel:0968-78-3111 Fax:0968-78-1092"

I got the postal code, and tried the following code:

>>> import posuto
>>> posuto.get('〒869-0198')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.8/site-packages/posuto/posuto.py", line 52, in get
    base = dict(_fetch_code(code))
  File "/tmp/venv/lib/python3.8/site-packages/posuto/posuto.py", line 21, in _fetch_code
    raise KeyError("No such postal code: " + code)
KeyError: 'No such postal code: 8690198'
>>> 

Searching the same postal code on Google.co.jp returns the right location on the map.

Untitled

Not sure how to provide a pull request, but thought it could be useful to report this missing postal code?

Anyway, great library, and nice trick of including an sqlite DB, might come in handy some day.

Thanks! Bruno

polm commented 3 years ago

Glad you had no trouble with the library except for the missing code!

This had me puzzled for a while, but it turns out this is a special postal code, known as a 大口事業所個別番号. So it seems it's only used for that one building. You can read more about the codes here:

https://www.post.japanpost.jp/zipcode/dl/jigyosyo/readme.html

If you use the general JP Post postal code search you'll see there's no result.

https://www.post.japanpost.jp/cgi-zip/zipcode.php?zip=869-0198

These special postal codes are provided in a separate CSV file that I haven't added to posuto. I guess I should work on doing that...

kinow commented 3 years ago

Ah, makes sense. I searched for the Sky Tree (〒131-8634) and it also didn't return anything (posuto or Japan Post). #TodayILearned.

kristate commented 3 years ago

Yes, it would be great if you could parse jigyosyo.csv [0] and add this information to the library.

[0] https://www.post.japanpost.jp/zipcode/dl/jigyosyo/readme.html

polm commented 3 years ago

Did not work on this this month, I'll look at it again next month.

polm commented 3 years ago

OK, I think this is working in the latest release, v0.4.0.

It turns out the data for these codes is different enough that converting it into the same format as normal postal codes doesn't make sense, so I just return them with a completely different structure. The JSON data is saved in a separate file.

Some other things about these codes:

I have used the term "company" above, but technically the organizations that get codes can be government offices or other organizations.

Also noting this here because it was hard to understand, but while the postal data uses five-digit JIS codes, the reference page uses six digit codes everywhere. Turns out the sixth digit is a check digit that's calculated in an odd way.

kinow commented 3 years ago

It turns out the data for these codes is different enough that converting it into the same format as normal postal codes doesn't make sense, so I just return them with a completely different structure. The JSON data is saved in a separate file.

Sounds really tricky to handle these codes.

I have used the term "company" above, but technically the organizations that get codes can be government offices or other organizations.

:+1:

Also noting this here because it was hard to understand, but while the postal data uses five-digit JIS codes, the reference page uses six digit codes everywhere. Turns out the sixth digit is a check digit that's calculated in an odd way.

How did you figure it out? Had a look at that page (with my broken Japanese) and couldn't see an explanation of how to parse that code in the page or PDF files. :nerd_face:

It's working for me now :+1:

(venv) kinow@ranma:/tmp$ pip install -U posuto
Collecting posuto
  Downloading https://files.pythonhosted.org/packages/af/97/8626d71e45e3f38bec91dd7558acd9b40246892aa66ce16531296a58e708/posuto-0.4.0.tar.gz (6.7MB)
     |████████████████████████████████| 6.7MB 1.3MB/s 
Installing collected packages: posuto
  Running setup.py install for posuto ... done
Successfully installed posuto-0.4.0
WARNING: You are using pip version 19.2.3, however version 21.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
(venv) kinow@ranma:/tmp$ python
Python 3.8.3 (default, May 19 2020, 18:47:26) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import posuto
>>> posuto.get('〒869-0198')
OfficeCode(jis='43368', kana='ナガスマチヤクバ', name='長洲町役場', prefecture='熊本県', city='玉名郡長洲町', neighborhood='大字長洲', banchi='2766', postal_code='8690198', old_code='86901', post_office='長洲', type='office', multiple=False, new=False, alternates=[])
>>> 
polm commented 3 years ago

How did you figure it out? Had a look at that page (with my broken Japanese) and couldn't see an explanation of how to parse that code in the page or PDF files.

Well, first I checked what jisx0402 was. That led me to the reference page, which doesn't use the term jisx0402, and I saw it was all six digit codes. Then I checked the Wikipedia article and that mentioned the check digit, but when I calculated the check digit for the first entry in the offices file it didn't match up. Then I found the README on the reference page, which has some special rules about the check digit buried in it, and then it matched up and I was able to confirm the codes were the same.

It is not well organized data. :/

Closing since this seems to work for now.

kinow commented 3 years ago

Thanks for fixing and for the explanation. Kudos on the detective work!!!