invinst / chicago-police-data

a collection of public data re: CPD officers involved in police encounters
https://invisible.institute/police-data
157 stars 60 forks source link

help to resolve unmatched officer identities #18

Closed rajivsinclair closed 6 years ago

rajivsinclair commented 8 years ago

in attempting to match up the officers named in the shootings-append.csv file against the officer profiles in the all-sworn-officers datatable we found 40 rows that are mismatched/malformed (missing data fields such as the first name of the accused officer ACCSUEDOFFICER_FNAME [sic]).

I propose the following methodology for attempting to resolve them: _take the not_match_officer.csv file and work through it one row at a time to fill in the missing identifying information based on date-of-appointment matches in the all-sworn-officers table._

  1. take the date of appointment for the unmatched officer's row in not_match_officer.csv
  2. filter the table CPD_Employees-one-row-per-individual to show only CPD employees who have the exact same Date of Appointment as the unmatched officer, and export this filtered list to a new file
  3. look for close/confident matches and copy those officer profile rows into a new file along with a new field identifying whether the row is a proposed confident match or if it is close but not sufficiently confident without more information
  4. repeat for all rows in not_match_officer.csv
  5. save the the new table of close+confident matches and add it to this branch or a new one (include all your working files, e.g., exports of filtered lists of all officers with the same APPOINTED_DATE, only if you think it’s helpful / not totally redundant)
DGalt commented 8 years ago

I've started working on this. As a first pass, of the 40 records in no_match_officer.csv, the date-of-appointment for 25 of them does not have a corresponding record at all in the all-sworn-officer dataset. These dates are:

[Timestamp('1985-06-03 00:00:00'),
 Timestamp('1988-12-27 00:00:00'),
 Timestamp('1990-10-22 00:00:00'),
 Timestamp('1991-08-01 00:00:00'),
 Timestamp('1991-12-19 00:00:00'),
 Timestamp('1992-04-16 00:00:00'),
 Timestamp('1992-05-01 00:00:00'),
 Timestamp('1992-05-01 00:00:00'),
 Timestamp('1992-05-01 00:00:00'),
 Timestamp('1992-05-01 00:00:00'),
 Timestamp('1992-06-16 00:00:00'),
 Timestamp('1993-07-01 00:00:00'),
 Timestamp('1996-08-29 00:00:00'),
 Timestamp('1997-01-02 00:00:00'),
 Timestamp('1998-09-16 00:00:00'),
 Timestamp('1998-09-16 00:00:00'),
 Timestamp('2001-12-11 00:00:00'),
 Timestamp('2003-05-27 00:00:00'),
 Timestamp('2004-08-31 00:00:00'),
 Timestamp('2005-09-06 00:00:00'),
 Timestamp('2011-12-16 00:00:00'),
 Timestamp('2011-12-16 00:00:00'),
 Timestamp('2011-12-16 00:00:00'),
 Timestamp('2014-04-16 00:00:00'),
 Timestamp('2015-10-19 00:00:00')]
rajivsinclair commented 8 years ago

@DGalt thanks for beginning to look into this!

For these 25 out of the 40 unmatched officers from the shootings data, there are no officers in the "all" CPD employees tables? I wish I could say that I was surprised by this disappointment. We'll have to submit some kind of follow-up for these specific log numbers to get more useful identifying information and link them manually then. I'll chat with Chaclyn about including it in the queue for the next batch of FOIA requests to CPD.

For the remaining 15/40 unmatched officers, can you find any likely matches amongst known officers who have the same recorded date of appointment?

DGalt commented 8 years ago

That's correct, the appointment dates for those 25 records do not match any appointment dates in the all_sworn_officer dataset.

I have gone through now and looked at all of the records for which there is a corresponding appointment date within the all_sworn_officer data set. Of those, only 4 (5 really, since one is a duplicate) - with last names ['ESCOBEDO', 'CONNORS', 'EDWARDS', 'FORGUE'] have obvious matches within the all_sworn_office set. Within the no_match_officer dataset, the fields are complete for CONNORS, EDWARDS, FORGUE. In contrast, ESCOBEDO is missing the first_name and unit fields, which can be filled in from all_sworn

As for the rest of the names that have, at a minimum, a corresponding appointment date in all_sworn - which are ['VALDES', 'JUDEH', 'SPURLIN', 'WELLERE', 'STAUNTON', 'WILLIAMSON', 'FRIERSON', 'JORDAN', 'TAIYOOB', 'WILLABY'] - none of these, based on the information available (basically last name and unit number) correspond to an entry I can find in all_sworn

So, not great. We get a little bit more info on the ESCOBEDO entry, but that's basically it.

The code / steps for all this can be found here (I can submit a PR for it if desired): https://github.com/DGalt/shootings-data/blob/dev/Unmatched%20officers.ipynb

banoonoo2 commented 8 years ago

@rajivsinclair Could you get me the info for the remaining 25 records with the complaint number for each? If the complaints were included in the June 3 cache, I can comb through the PDFs and hopefully fill in any missing or mistyped information.

I searched the media-files index and DocumentCloud for each of the names and found a few clues.

I think some names are for civilian Detention Aides who work in the lockups. They're not sworn officers, so they don't have star numbers and won't appear in the all-sworn-officer dataset. Allegedly, Willaby was injured, so he appeared as a "victim" with extra details including his appointment date, which matches the unmatched-officers list. The others were simply listed as the Lockup Keeper or with lockup notes like "fed at 18:04."

Detention Aides/Lockup Keepers:

Other possible matches:

There are 26 PDFs that pop up for "Phillips," so a complaint number would really help!

Update: I recall from the PDFs that one of the incidents occurred/started in Harvey, IL, so some may be Harvey PD or some other department, not CPD officers or CPD civilians.

hieueastagile commented 8 years ago

Hi guys,

We are working on importing these stuff into current CPDB database. Recently, we improved our importer for the sworn-officer dataset and get more accurate result for the matching officer to the shootings-data. And here are the updates:

So, in conclusion, we have about 46 rows which are unmatched now. We included all of them and theirs crids into a csv file, you can find it here If you guys need more information about them, please let us know. Thanks.

banoonoo2 commented 8 years ago

@hieueastagile That's awesome! I tried opening the linked csv file. I see lots of files of code, but no data. Am I doing it wrong?

hieueastagile commented 8 years ago

@banoonoo2 So sorry, we didn't expect that saving the file with Excel will change its data structure inside. If you can, please try to open it with Excel, we will try to upload the pure csv tomorrow then.

DGalt commented 8 years ago

Alright, here's the result of some more cleaning - basically I've removed any nan values, and anything that doesn't need to be in a list is no longer in a list. I'll submit a PR for this, although I'm not sure where to put it.

https://github.com/DGalt/shootings-data/blob/dev/summary_ipra.csv

banoonoo2 commented 8 years ago

@hieueastagile When I unzip it, I just see this directory of styling code files. My computer wants to open 'em in Corel Painter (!), but I opened 'em in TextEdit. I didn't see anything resembling a csv, Excel, or other database file. I'll have to wait for the pure csv tomorrow. failed_with_crid

rajivsinclair commented 8 years ago

@banoonoo2 I’ve downloaded the file that @hieueastagile uploaded and opened it in Excel then exported it again as CSV and now it seems perfectly fine (even GitHub.com can preview) here.

screenshot 2016-06-16 23 12 33
banoonoo2 commented 8 years ago

@rajivsinclair @hieueastagile Thanks for bearing with my technical difficulties! I'll get started on them later this evening.

hieueastagile commented 8 years ago

@banoonoo2 If you can't open it, let us know, we will send you a copy of pure csv.

banoonoo2 commented 8 years ago

I searched the IPRA portal for each CRID to see if it was included there. Most were not. 1077477 was, but only a video was provided, no documents.

I searched DocumentCloud by last name, by CRID, and by appointment date (using the dd-mmm-yyyy format used in the TRRs and OCRs). I was able to fill in a couple more Detention Aides this way.

I double-checked the CPD Employees csv, but only confirmed there were no close or confident matches there.

failed_with_crid_AS.xlsx