lvphj / epydemiology

Python code for epidemiologists – eventually
MIT License
8 stars 2 forks source link

Checking postcodes with dictionary method #20

Open lvphj opened 6 years ago

lvphj commented 6 years ago

If a dataframe containing a column of postcodes is checked using epy.phjCleanUKPostcodeVariable() function with the 'dictionary' option, then an error may occur if no incorrect postcodes exist. This may possibly be due to scratchDF at line 1053 of phjCleanUKPostcodes.py file being None (although need to check). An error message produced under such circumstances is given below:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-bdeeaccc8d42> in <module>()
     13                                             phjCheckByOption = 'dictionary',
     14                                             phjDropExisting = True,
---> 15                                             phjPrintResults = True)

/anaconda3/envs/python36_venv/lib/python3.6/site-packages/epydemiology/phjCleanUKPostcodes.py in phjCleanUKPostcodeVariable(phjTempDF, phjRealPostcodeSer, phjOrigPostcodeVarName, phjNewPostcodeVarName, phjNewPostcodeStrLenVarName, phjPostcodeCheckVarName, phjMissingValueCode, phjMinDamerauLevenshteinDistanceVarName, phjBestAlternativesVarName, phjPostcode7VarName, phjPostcodeAreaVarName, phjSalvageOutwardPostcodeComponent, phjCheckByOption, phjDropExisting, phjPrintResults)
    243                                                                   phjNewPostcodeStrLenVarName = phjNewPostcodeStrLenVarName,
    244                                                                   phjPostcodeCheckVarName = phjPostcodeCheckVarName,
--> 245                                                                   phjMinDamerauLevenshteinDistanceVarName = phjMinDamerauLevenshteinDistanceVarName)
    246 
    247 

/anaconda3/envs/python36_venv/lib/python3.6/site-packages/epydemiology/phjCleanUKPostcodes.py in phjGetBestAlternativePostcodes(phjTempDF, phjRealPostcodeArr, phjNewPostcodeVarName, phjNewPostcodeStrLenVarName, phjPostcodeCheckVarName, phjMinDamerauLevenshteinDistanceVarName, phjBestAlternativesVarName)
   1059                                                                                                            phjRealPostcodeArr = phjRealPostcodeArr,
   1060                                                                                                            phjNewPostcodeVarName = phjNewPostcodeVarName,
-> 1061                                                                                                            phjAllowedEdits = 1),axis = 1)
   1062 
   1063     phjTempDF.update(phjScratchDF)

/anaconda3/envs/python36_venv/lib/python3.6/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   2512 
   2513         if isinstance(key, (Series, np.ndarray, list, Index)):
-> 2514             self._setitem_array(key, value)
   2515         elif isinstance(key, DataFrame):
   2516             self._setitem_frame(key, value)

/anaconda3/envs/python36_venv/lib/python3.6/site-packages/pandas/core/frame.py in _setitem_array(self, key, value)
   2536             if isinstance(value, DataFrame):
   2537                 if len(value.columns) != len(key):
-> 2538                     raise ValueError('Columns must be same length as key')
   2539                 for k1, k2 in zip(key, value.columns):
   2540                     self[k1] = value[k2]

ValueError: Columns must be same length as key
lvphj commented 2 years ago

Error seems to occur when function attempts to list possible alternative: e.g. Consider first postcode entry: AB123CD Returned list of edits: [1, ['AB123XD', 'AB123DD', 'AB123FD']]