lucyparsons / OpenOversight

Police oversight and accountability through public data 👮
https://openoversight.com
GNU General Public License v3.0
240 stars 79 forks source link

bulk_add_officers update by ID #743

Closed abandoned-prototype closed 4 years ago

abandoned-prototype commented 4 years ago

When using the bulk_add_officers command officers can be identified by either the unique_internal_identifier or a combination of first name, last name and badge number. @brianmwaters also has a PR pending that allows to identify officers by just first name, last name to be used by smaller departments that don't have any name collision issues.

For Chicago, these option do not seem to be sufficient. Badge numbers seem to be assigned to the role and not the person, which means that they change with a promotion of the officer. Internally CPD uses employee numbers as unique identifiers, but these are usually redacted in record requests.

Therefore, I propose the following: bulk_add_officers also accepts the ID of the officer in the database as the key for merging the incoming csv file with the existing data. So if an id field is present in the provided csv, an officer will be selected solely based on that id. If there is no officer with that id the command fails. If the field is empty a new officer will be added to the database, regardless of all other fields present.

The down side of this implementation is that if the id field is present but left empty, each row will create a new officer, which could be unintentional. Further, updating the existing roster requires working with the data from the website, i.e. downloading the csv from https://openoversight.lucyparsonslabs.com/download/all and then merging accordingly. On the plus side, this implementation avoids unintentional merges of officers and gives the creator of the csv full control over what will happen once this csv file is merged which seems critical to me when it comes to updating the rosters of police department with 10,000+ officers.

I am happy to work on implementing this, but hoping for comments if this makes sense.

abandoned-prototype commented 4 years ago

Update on this: I decided to add a new command to accomplish this to not add even more complexity to the existing bulk_add_officers. Basically I want it relatively close to what a COPY table FROM path CSV sql command would do, meaning that there is less error-correction but closer control on the state of the database after running the command. I imagine this command being useful for big departments that need complex merges or that have otherwise messy data necessitating some data-engineering before import, while smaller departments can still use the bulk_add_officers command and fix potential issues that arise afterwards manually. I am currently working on allowing bulk import of incident data. Will put out a draft PR soon, so this hopefully will make more sense then