Granularity of by-race, by-day data

For the regression discontinuity method, we were planning on using days since State Attorney Kim Foxx's entry as our running variable. So, we centered the sentencing date variables around Foxx's entry into office and first tried to look at the granularity of the data.

sentencing_byday = sentencing_analysis.groupby(['sa_timedelta_days', 'is_black'])['is_incarcerated'].agg([('n', 'size')]).reset_index()
sentencing_byday

and our data looks like this:

We wanted to look at the distribution of n around the bandwidth of Foxx's entry. This is the distribution for the range of -90 and 90 days since her entry:

sentencing_byday[(sentencing_byday.sa_timedelta_days >= -90) & 
                 (sentencing_byday.sa_timedelta_days <= 90)].n.describe()

===================
count    258.000000
mean      33.759690
std       18.116474
min        1.000000
25%       22.000000
50%       30.000000
75%       47.000000
max       88.000000

Question: Is this by-race, by-day distribution too sparse for an RD approach? Should we use weeks or months instead of days?

jswsean / ppol564_final_project_group4

Granularity of by-race, by-day data #1