symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
238 stars 58 forks source link

Inconsistency in "country code" column name #33

Closed dword4 closed 4 years ago

dword4 commented 4 years ago

There seems to be some inconsistency with how to access data returned by the query_postal_code() function in addition to some errors thrown.

pgeocode==0.2.1 pandas==1.0.3

Here is the creation of the object with a postal code

Python 3.6.8 (default, Apr 25 2019, 21:02:35)                                                                                        [20/589]
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgeocode
>>> code = "M4B 1B3"
>>> loc = pgeocode.Nominatim('ca')
>>> s = loc.query_postal_code(code)
>>> print(s)
postal_code                                                M4B
country code                                                CA
place_name        East York (Parkview Hill / Woodbine Gardens)
state_name                                             Ontario
state_code                                                  ON
county_name                                         East York
county_code                                                NaN
community_name                                             NaN
community_code                                             NaN
latitude                                               43.7063
longitude                                             -79.3094
accuracy                                                     6
Name: 0, dtype: object

Now I will try to access s.country_code and get an error

>>> print(s.country_code)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib64/python3.6/site-packages/pandas/core/generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'country_code'

And now another field with the same underscore styled name

>>> print(s.postal_code)
M4B

Now using the manner that is often seen with Pandas

>>> print(s['country code'])
CA

But when I try this for postal code it throws a different error

>>> print(s['postal code'])
Traceback (most recent call last):
  File "/usr/local/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 4411, in get_value
    return libindex.get_value_at(s, key)
  File "pandas/_libs/index.pyx", line 44, in pandas._libs.index.get_value_at
  File "pandas/_libs/index.pyx", line 45, in pandas._libs.index.get_value_at
  File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
  File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib64/python3.6/site-packages/pandas/core/series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/local/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 4419, in get_value
    raise e1
  File "/usr/local/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'postal code'
rth commented 4 years ago

Thanks @dword4! Absolutely, that's a typo is the field name. It should have been country_name (with _) here https://github.com/symerio/pgeocode/blob/5b5d68c3f9b468e04dc3211f9f91aaa453a38db5/pgeocode.py#L22

Would you be interested in making a Pull Request to fix it?

dword4 commented 4 years ago

Sure I will take a crack at it, will try to get it up this evening sometime

dword4 commented 4 years ago

https://github.com/symerio/pgeocode/pull/35 here we go

rth commented 4 years ago

Fixed in #35 . Thanks!

leducvin commented 3 years ago

Hi, somehow this is still present in v0.3.0.

I checked out the v0.3.0 tag, installed pgeocode in a virtualenv, installed pytest and ran the tests:

pytest -x
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/vleduc/temp/pgeocode
collected 100 items                                                            

test_pgeocode.py .....F

=================================== FAILURES ===================================
___________ test_countries[CA-M5R 1X8-Toronto-H2Z 1A7-Montreal-503] ____________

country = 'CA', pc1 = 'M5R 1X8', location1 = 'Toronto', pc2 = 'H2Z 1A7'
location2 = 'Montreal', distance12 = 503

    @pytest.mark.parametrize(
        "country, pc1, location1, pc2, location2, distance12",
        [
            ("FR", "91120", "Palaiseau", "67000", "Strasbourg", 400),
            ("GB", "WC2N 5DU", "London", "BT1 5GS", "Belfast", 518),
            # ('AR', 'c1002', 'Buenos-Aires', '62091', 'Rio-Negro', 965), known failure   # noqa
            ("AU", "6837", "Perth", "3000", "melbourne", 2722),
            ("AU", "6837", "Perth", "0221", "Barton", 3089),
            ("US", "60605", "Chicago", "94103", "San Francisco", 2984),
            ("CA", "M5R 1X8", "Toronto", "H2Z 1A7", "Montreal", 503),
            ("IE", "D01 R2PO", "Dublin", "T12 RW26", "Cork", 219),
        ],
    )
    def test_countries(country, pc1, location1, pc2, location2, distance12):
        if country == "IE":
            pytest.xfail("TODO: Investigate failure for IE")
        nomi = Nominatim(country)

        res = nomi.query_postal_code(pc1)
        assert isinstance(res, pd.Series)
        assert _normalize_str(location1) in _normalize_str(res.place_name)

>       assert "country_code" in res.index
E       AssertionError: assert 'country_code' in Index(['postal_code', 'country code', 'place_name', 'state_name', 'state_code',\n       'county_name', 'county_code', 'community_name', 'community_code',\n       'latitude', 'longitude', 'accuracy'],\n      dtype='object')
E        +  where Index(['postal_code', 'country code', 'place_name', 'state_name', 'state_code',\n       'county_name', 'county_code', 'community_name', 'community_code',\n       'latitude', 'longitude', 'accuracy'],\n      dtype='object') = postal_code                                                     M5R\ncountry code                                      ...                    -79.4035\naccuracy                                                        6.0\nName: 0, dtype: object.index

test_pgeocode.py:55: AssertionError
=========================== short test summary info ============================
FAILED test_pgeocode.py::test_countries[CA-M5R 1X8-Toronto-H2Z 1A7-Montreal-503]
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
========================= 1 failed, 5 passed in 1.22s ==========================