symerio / pgeocode

Postal code geocoding and distance calculation
https://pgeocode.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
238 stars 58 forks source link

ValueError on CA, UK and IE postcodes #49

Open timothymonteath opened 3 years ago

timothymonteath commented 3 years ago

First off, this is a great package which has really helped me out! I have encountered a bug with postcodes for CA, UK and IE post codes. The error I am getting is

ValueError                                Traceback (most recent call last)
<ipython-input-1174-3c5ae744b055> in <module>
----> 1 import codecs, os;__pyfile = codecs.open('''/tmp/pyjoGiOo''', encoding='''utf-8''');__code = __pyfile.read().encode('''utf-8''');__pyfile.close();os.remove('''/tmp/pyjoGiOo''');exec(compile(__code, '''/home/Downloads//Data_Cleaning/cleaning.py''', 'exec'));

~/Downloads//Data_Cleaning/cleaning.py in <module>
    500         lookup.columns =
    501 
--> 502 
    503 
    504 df1_missing_city = df1.groupby('Target country code').apply(lambda x: len(x[x['Target city'].isnull() == True]) / len(x) * 100)

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4198             else:
   4199                 values = self.astype(object)._values
-> 4200                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4201 
   4202         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

~/Downloads//Data_Cleaning/cleaning.py in <lambda>(x)
    500         lookup.columns =
    501 
--> 502 
    503 
    504 df1_missing_city = df1.groupby('Target country code').apply(lambda x: len(x[x['Target city'].isnull() == True]) / len(x) * 100)

~/Downloads//Data_Cleaning/cleaning.py in postcode_lookup(postcode, form)
    489                 'VA',
    490                 'VI',
--> 491                 'WF',
    492                 'YT',
    493                 'ZA']

~/anaconda3/lib/python3.8/site-packages/pgeocode.py in query_postal_code(self, codes)
    305 
    306         codes = self._normalize_postal_code(codes)
--> 307         response = pd.merge(
    308             codes, self._data_frame, on="postal_code", how="left"
    309         )

~/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     72     validate=None,
     73 ) -> "DataFrame":
---> 74     op = _MergeOperation(
     75         left,
     76         right,

~/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    654         # validate the merge keys dtypes. We may need to coerce
    655         # to avoid incompatible dtypes
--> 656         self._maybe_coerce_merge_keys()
    657 
    658         # If argument passed to validate,

~/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self)
   1163                     inferred_right in string_types and inferred_left not in string_types
   1164                 ):
-> 1165                     raise ValueError(msg)
   1166 
   1167             # datetimelikes must match exactly

ValueError: You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat

I have traced back this error as far the _normalize_postal_code which seems to be throwing up the merge error latter query_postal_code function. Although I haven't been able to figure out what it is about splitting these postcodes which seems to be upsetting pandas so much.