Open mattrehbein opened 6 years ago
I like this exploration, the two charts are clear! Maybe it's better to add a legend and a title to the first chart?
I've cleaned up my original graph, but haven't figured out how to plot my racial, age and gender breakdowns yet (more on that under problems below). A lot of my time has been spent, and sadly wasted, on looking through secondary datasets (more on that under changes in direction below).
I know the red is a bit aggressive, but so are gangs, right?
Probably -- I've explored two more datasets about crime in Chicago, but neither provided anything useful to compare with the CPD. I'll keep looking, but as I think about it, it seems unlikely that I'll find gang-related arrest data from an independent source, as the police are the ones who generate that info.
So, I'm still looking for additional data to work in, but the project may simply end up being a presentation of the gang landscape the CPD database presents.
I've been trying to plot some stacked charts to break down gang membership by race, age, and gender, but I haven't been able to get it to work yet, as I'm still trying to grasp fully how pandas and matplotlib work together. Grouping by one column's value counts, then plotting value counts of another is what I'm after but stumbling over.
I've made my stacked graph for racial breakdown by gang, but I'm not sure it's all that interesting. What seems to make a stronger point is comparing the racial breakdown of the CPD's gang database and the racial breakdown as the city as a whole. Below are my first crude attempts at graphing this (didn't have time to fix obvious issues like overlapping labels before the update deadline).
I'm including both bar graphs and the verboten pie charts -- I haven't decided which is the best way to compare these two, though i'm leaning toward the bars. Again these are super crude bc I only started looking at this angle today, but I would love any suggestions anyone has on what's the best way to show how the gang db does not reflect the city's overall demographics.
First here's the stacked chart by gang, which I was efforting from last update but may omit from final:
Now the comparison graphs of alleged gang members vs all Chicagoans: alleged gang members:
entire city:
gang pie:
city pie:
Small shift in what to focus on in the graphs, as described above.
I'm working on a graph that combines them, with one stacked bar showing race for the gang numbers, and one stacked bar showing race for overall Chicago population. the dataframes are too incongruent to join though, i think, so i'm not sure how to go about this yet.
Very interesting work so far! Don't know if this might help you out, but when I've worked on similar projects, I've looked into patterns that might help the reader understand who is an average member of the gang. What is its average age, race and gender? Then I've compared that to number of murders of that same age group, which gives the reader an idea of the way violence is affecting a certain population, or compared it with the number of school dropout rates or any other variable that might indicate the vulnerability of that specific population. This might give perspective as to why there are so many people of a certain race in gangs.
About your first graph, it'd be useful to sort races in the legend box so it can be easier to locate the two most common ones. Maybe assign a color to each race so its easier to see their evolution from chart to chart.
basic thing to know about race and ethnicity:
hispanic is not a race, it's an ethnicity. so hispanic may be any race. you can have black cubans or white cubans, for instance.
the city is counting race/ethnicity one way and the gang data are counted differently. so they are not easily compared.
Just over a month ago, a half dozen activist groups and four men sued the city of Chicago, its police department and various public officials over law enforcement's use of a database designed to track members of criminal gangs.
The federal class action lawsuit decries Chicago police's gang database as "arbitrary, discriminatory, over-inclusive, and error-ridden." The four named plaintiffs deny being gang members and argue that their inclusion in the database has hurt their job prospects and led to harassment from police.
The Chicago police department released scrubbed versions of its gang database in November and again in March, according to ProPublica, which offers a free download of the database via its data store.
Analysis of the released information corroborates much of what the lawsuit and other critics have alleged. (FULL TEXT IS ON WEBSITE)
Including screen shots of my two waffle charts bc my pngs are stripped of legends bc I built those in html.
Racial and ethnic breakdown of Chicago
Racial and ethnic breakdown of alleged gang members
Racial and ethnic breakdown of Chicago's biggest gangs, according to police database Racial and ethnic breakdown of Chicago's biggest white-majority gangs, according to police database
Thanks to Sarah for the flagging the race v ethnicity aspect, which made me realize that I had a wrong number in from the Census and ensured my text in the project accurately labeled groups.
Headline: Gangs of Chicago, through police's flawed lens
Published website version: https://mattrehbein.github.io/Chicago/
Code repository: https://github.com/mattrehbein/data_studio/tree/master/code/03-project
Final data set(s): CPD via ProPublica
Coming up with the best graph types.
I'm pretty satisfied, but with more time, I would do similar analysis on the age breakdown of alleged gang members in the database, and work on some way to visualize the age of the entries in the database (to get at the criticism that much of the entries in the database are outdated).
I need to sort my two bar charts.
Pitch
Summary
I want to explore the pervasiveness of gangs in Chicago via the CPD's database of alleged gang members. I got the data set from ProPublica, which, along with several other news outlets, have published pieces reporting the flaws of the CPD database. I want to analyze and visualize some of this data to 1) Get an idea of the scope of gang activity in the city based the police database and 2) Try to fact check the database a bit by comparing its numbers to other data sets.
I've got a second data set from data.org on crime in Chicago going back to 2001. It's massive and I'm still poking around in it to see whether it has info I can pair with the gang db.
Possible headline(s): Police's view of gangs in Chicago
Data set(s): CPD via ProPublica and data.gov
Code repository: https://github.com/mattrehbein/data_studio/tree/master/code/03-project
Possible problems/fears/questions:
Work so far
I've downloaded the CPD data set and read it in pandas. I've analyzed it a bit to look at how many gangs are on record and how many are in each, again and as for everything here, according to the police numbers. I've played around with the graphs a little. I want to look at the bigger gangs' demographic breakdown but haven't graphed that yet. I'm thinking I'll use stacked bars for that but I'll look around for other ways to visualize demos in a group.
I've also read ProPublica's various pieces on the flaws in the dataset and what police have told them about it. I'm using this as a guide to clean the data and to take a few things into consideration when working with it.
Maybe something like this:
Checklist
This checklist must be completed before you submit your draft.