invinst / chicago-police-data

a collection of public data re: CPD officers involved in police encounters
https://invisible.institute/police-data
157 stars 60 forks source link

Assign all POs in all_sworn a unique ID? #35

Closed DGalt closed 6 years ago

DGalt commented 8 years ago

I'm currently working on putting together data sets for the individuals in the May and April dumps that I can confidently ID as either police officers or not police officers, and I'm finding that there currently is no good way to uniquely ID a particular officer. Their name alone isn't enough, so I end up having to use a combination of different data sources to ID them.

This is fine, except when I want to go back and look at that officer again (or match him/her in another data set again) I need to once again use those different sources to ID him.

It might be worth considering assigning all of the officers in the all_sworn data set some kind of unique ID so that when I identify someone in the April and May data set as one particular officer, I can assign that entry that unique ID. It would make cross referencing these different data sets easier I think. @rajivsinclair I know that we don't have employee IDs, but have you all discussed assigning some kind of equivalent ID # to the entries in all sworn for this purpose?

jilmun commented 8 years ago

Not sure if this helps, in the Feb data, it looks like the officers have a unique PERS_STAR_NO code here [https://github.com/invinst/shootings-data/blob/master/Clean/Feb2016/dat_feb2016_officer.csv]

DGalt commented 8 years ago

This is equivalent to Star1, Star2, etc. in all_sworn I'm assuming? The problem with the star numbers is that they can potentially change over time / be reused - right @rajivsinclair?

banoonoo2 commented 8 years ago

Yeah, their star numbers change when they get promoted (or demoted) and get a new badge. This blog explains the star number ranges for different ranks: http://chgopdfan.tripod.com/id12.html (There are some star number changes that don't seem to be explained by promotion/demotion, too.) The May data csv has fields for up to 10 star numbers per individual. An individual's star number can change over time, and eventually each star number is reissued.

I'm fairly certain you could create a unique ID by combining the individual's first star number and date of appointment. The chance of a duplicate seems very unlikely, and then you'd have something linked to this and future CPD data instead of a number assigned by us.

chaclynhunt commented 8 years ago

hi all, I'm going to ask @rajivsinclair to chime in later, with a more comprehensive answer, but for now:

the unique identifier we've been assigning follows this format:

first name - last name - middle initial - birth year - date of appointment - race - gender

examples: JAMES-BANSLEY-A-1983-2009-12-16-WHITE-M JOEL-BENTLEY-A-1976-1999-10-25-WHITE-M KEVIN-CONNORS-M-1975-1999-09-13-WHITE-M

hieueastagile commented 8 years ago

@chaclyn @DGalt @banoonoo2 @rajivsinclair I think we can use the unique ids that CPDB database are using. CPDB database is usually updated with all the data, so, we think that most of the data in sworn officer are already there. What do you guys think about it?

DGalt commented 8 years ago

If there is already a system in place within the CPDB database then yeah I'd think it'd be best to just use that. Is this what you're referring to above @chaclyn?