Open johny-b opened 4 years ago
a KeyError I encountered tonight is very similar, but as i write this I'm thinking I'll try first to fix it with some exception handling, or perhaps my design for aggregating this historical data just needs a rethink :). But maybe this post can provide helpful details for future improvements.
Context: I am analyzing covid data by U.S. county, keyed on 5 digit FIPS code and a date timestamp. I was using a current list of counties and cases and fatalities, then doing .loc searches of a historical dataset (which also has a multi-index on fips code and date timestamp) in order to add columns for cases and fatalities in the same area one month ago, two months ago, etc.
Python throws a Key Error where there are a few missing county records in my historical dataset. Here is my code snippet, followed by print statements I added for debugging, and finally the error output:
fipslist = list(dfstats.fips.unique()) asof = df.date.max() for x in iter(fipslist): prior_dt: dt.date = asof - dt.timedelta(days=30) prior_row = df.loc[(str(x), priormth)] # df = historical pd.df keyed on fips and date dfstats.at[dfstats['fips']==x, 'cases_30'] = prior_row['cases'] dfstats.at[dfstats['fips']==x, 'deaths_30'] = prior_row['deaths']
[print statements I added to my code: it was running fine until it tried to do a .loc for a county fips code in Alaska which did not have an entry:]
[processing correctly for this record:]
Name: (02198, 2020-09-19 00:00:00), Length: 7, dtype: object
value priormth =2020-09-19 00:00:00 value fips =02220
PRIOR_ROW =fips 02220
date 2020-09-19 00:00:00
county Sitka City and Borough
state Alaska
cases 55
deaths 0
pop NaN
[it choked- could not locate the tuple for multi-index fips, date: ( '02230', '2020-09-19' ) ]
Name: (02220, 2020-09-19 00:00:00), Length: 7, dtype: object
value priormth =2020-09-19 00:00:00 value fips =02230
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "
I'm not submitting this strictly as a bug because this is so messed up that I seriously consider tuples in indexes just simply never work, but I don't think the docs are clear about it.
I'm using python 3.6.9 and pandas==1.1.3.
Example 1:
works for
first
and not forsecond
:Example 2:
works for the first row only: