jsoma / data-studio-projects

12 stars 18 forks source link

NYPD vs Chicaco PD fireable offenses #242

Open vpenney opened 6 years ago

vpenney commented 6 years ago

Pitch

Summary

Back in April, BuzzFeed published a batch of about 1,800 of the NYPD's disciplinary records from 2011-2015. These records were actually publically available (albeit not online) up until 2016, and the NYPD is fighting police unions in court to make "deidentified" disciplinary records public again.

I want to compare the disciplinary penalties of the NYPD and Chicago PD to see if the NYPD is more lenient with its officers. It looks like even if you miss 184 days of work and demonstrate that you are "incompetent to continue in service at the New York City Police Department," you only get ten vacation days docked. Let's see how Chicago handles it.

Here are some fun visualizations I found:

I could use area to show the frequency with which NYPD officers are fired vs CPD officers: ethos3 creative visualization-600x367

I could use bubbles for ... something. 1__bv5w2ppk2ryeqa_sbtypg

Details

Possible headline(s): The NYPD and Chicago PD can't seem to agree on "fireable offenses" (if there are discrepancies) OR The NYPD and Chicago PD's definitions of fireable offenses (if the two departments align)

Data set(s): BuzzFeed NYPD records Chicaco PD disciplinary records

Code repository: Repository is here

Possible problems/fears/questions: Fears:

Work so far

So far, I've mainly just done research and dug up all of my resources. Next steps are getting the data cleaned and set up in dataframes so I can analyze it.

Checklist

This checklist must be completed before you submit your draft.

sarahslo commented 6 years ago

just looked at the nypd documents and i'm not sure how you are going to do this. it seems like each case has a bunch of offenses attached to it and they are basically a summary of bad behavior. but not like, here's a list of things you can get in trouble for. just, here's the thing you did that was wrong and like the people who did it, they are all different things.

separately i'm curious as to how much vacation days these guys get? i see one guy having more than 30 vacation days docked. can he just earn them back by working overtime? do they all have like, a million vacation days in the bank? how does this work?

vpenney commented 6 years ago

Update

No surprises: this project is a mess.

Visuals

So far, I have the Chicago data in a pretty good place. These are all of the officers who were accused of crimes and either quit, or were fired, between 2011 and 2015. Those years correspond with the NYDP data.

screen shot 2018-08-07 at 9 12 10 pm

I also have a graph of the total number of officers charged with misconduct and am working on graphing that side-by-side with the guilty charges, for comparison.

I finally have the NYPD pdfs in text, so I just need to wrap up some regex and then analyze that data.

Any changes in direction or topic?

Nope! Still unwaveringly dedicated to seeing this through.

Delete this line and explain any changes you've made

Problems/Questions

Analyzing the NYPD data is going to be difficult. Often, officers are charged with more than one offense, so what I'll probably do is just use, you know, a ton of "if" statements with regex and create a new row in a dataframe for each charge, but tie all of the charges to the same case number. That way, I can sort by "guilty" and "not guilty" verdicts and use some more regex to identify the primary charges and compare those to the Chicago charges. Woohoo!

Checklist

vpenney commented 6 years ago

Final

Project visuals/text

police-department-area-charts

Details

Headline: NYPD Officer Discipline Doesn't Add Up

Published website version: https://vpenney.github.io/police-discipline/

Code repository: https://github.com/vpenney/data-studio

Final data set(s): https://github.com/vpenney/data-studio

What did you find to be the most difficult part of this project?

Without a doubt, it was trying to get the Chicago and NYPD data to a point where they could be compared. It turns out that that was a fool's errand, because the NYPD has a separate record system for terminated employees and none of those records were leaked.

It was also difficult to find a way to simplify the data in a coherent manner, which is why I ended up with just one visual and also why I did the whole thing in Illustrator.

I need 20-30 more hours to finish cleaning the NYPD data by hand, but ultimately, I think it would be amazing (depressing?) to look at the average number of vacation days docked for "excessive force" vs. "criminal association" and see if there's any relationship between the crime and punishment. I am willing to bet one very shiny nickel that there isn't.

Are you satisfied with what you produced? Is there anything you would like to change or improve?

Within the time constraints of this project, yes; overall, no. I think there's a lot more to find in this data, but it just takes forever to correct the typos in 550 poorly converted PDFs to the point where they can be regexed.

Checklist

christina10211 commented 6 years ago

I love the visuals and the way you use colors in your annotations. I completely share the feeling when you have a messy dataset and you just spent hours and hours cleaning it up. Look forward to your new findings if you ever decide to revisit this project in the future!