Do facial recognition on per-department basis

redshiftzero commented 8 years ago

from this repo. Face detection is fine

redshiftzero commented 8 years ago

Actually we should keep this around but just have it be optional. For some (most) jurisdictions doing facial recognition against our database is fine.

elimisteve commented 6 years ago

Hey @redshiftzero, what is the status of trying to integrate officer facial recognition into OpenOversight? I know someone, namely @brucearctor, who may be interested in working on this. He's curious what computing power is available to do the deep learning on.

Thanks!

redshiftzero commented 6 years ago

Hey @elimisteve and @brucearctor - help with integrating facial recognition is definitely welcome!

Doing facial recognition on cops is a problem for us legally in Illinois, so we'd want to first have an additional column in the Departments table to denote which departments have facial recognition turned on.

In terms of DL hardware... heads up this web application is running in prod on a pretty lightweight machine. One way to go about doing this would be to:

Spin up another box as a machine learning worker (we probably do need to use GPU acceleration to do this well)
Run a web service like this - this is pretty easy as it's using the pre-trained dlib resnet facial recognition model (background)
As images are submitted, IF they are from a department where we can do facial recognition law-wise, we'd send off the images to the API to have faces recognized (relevant: #146)
The other piece is that we'd also need to add an API endpoint to the main web app so that the machine learning worker is updated on the latest identified faces

I'm just spitballing, so feel free to point out any obvious issues here

redshiftzero commented 6 years ago

@b-meson pointed out that police departments are apparently using https://aws.amazon.com/rekognition/faqs/, we might actually be able to use the same thing (lol)

redshiftzero commented 6 years ago

hey so i checked out using rekognition today for #6, and it's pretty good so far - see branch face-detection for an example of how to integrate it into the web application.

Facial recognition wise, there are a couple things we could do:

Auto-identify officers in photos that are submitted without precondition (while possible we'd want to avoid this)
Auto-identify officers in photos that are submitted provided they are ranked military/officers/police with high confidence (using rekognition's object labeling API)

Integrating this would significantly reduce burden on volunteers identifying officers, so a worthwhile endeavor

ejfox commented 5 years ago

I'm interested in potentially using https://github.com/justadudewhohacks/face-api.js

This would require running a model (and re-running it when officers are added) on the server side and then making that model available to the browser for search. I'd be interested in tackling this if there's enough interest.

McEileen commented 5 years ago

Hi @ejfox, I'm carrying over the conversation we had in #654. To @camfassett's question, as long as we can ensure that we don't do facial recognition on IL officers, I am fine with incorporating it into OO. Earlier, @redshiftzero proposed adding a field in the Departments table to indicate what departments have facial recognition turned on, and I think that could work well. I'm retracting my earlier objection to adding facial recognition. Also, @ejfox, I don't have any experience working on facial recognition, but I'm NYC-based and can pair with you on the weekends or evenings if you'd like to meet with another OO team member. You're also welcome to forge ahead on your own. Thank you for your involvement!

ejfox commented 5 years ago

@McEileen That sounds good! Would love to pair on this further with anyone who is located in NYC. I'm free to meet anywhere between Midtown and Bushwick.

I did a little more exploration / research and below are my notes on potential next steps / ways to pursue this

Goal

Jane has an interaction with a police officer who fails to identify themselves
Jane takes a photo, or finds a photo of her talking to that officer which is an unknown officer image
Jane goes to OpenOversight and uploads that photo and draws a box around the officers face
OpenOversight searches it's database of uploaded officer images and returns any officers that have a >50% match to that officer's face as matching officers

Caveats

Illinois provides the victim with a monetary award when a Company captures, uses, stores, or transmits facial recognition data without complying with every part of the law. The amount of the monetary award depends on whether the violation was negligent, reckless, or intentional. If the Company negligently fails to comply with the Biometric Information Privacy Act, the victim whose facial geometry was mishandled is entitled to a $1,000.00 award or the value of the actual harm caused, whichever is greater. Brabender Law LLC Is Facial Recognition Legal

A lot of care needs to be taken to ensure that this tool is never run, even inadvertently, on any officer in Illinois including the Chicago Police Department.

Technical execution

I propose using face-api.js, a javascript-based framework for facial recognition with no external API dependencies (like Amazon or Microsoft, who provide APIs with similar facial recognition functionality) - I think this allows us to have maximum control as well as future-proof, as these APIs may become inaccessible in the future or subject to political pressure to not provide services to OpenOversight.

The usage of face-api.js can be summarized as follows

To keep it simple, what we actually want to achieve, is to identify a person given an image of his / her face, e.g. the input image. The way we do that, is to provide one (or more) image(s) for each person we want to recognize, labeled with the persons name, e.g. the reference data. Now we compare the input image to the reference data and find the most similar reference image. If both images are similar enough we output the person’s name, otherwise we output ‘unknown’. Vincent Mühler JavaScript API for Face Recognition in the Browser with tensorflow.js

In the OpenOversight case

The input image is Jane's unknown officer image
The reference data is OpenOversight's existing corpus of officer images with names attached
The matching officers are any officers who have a similarity score above 50%

To accomplish this we need

A generated list (JSON or CSV) of every officer in the database, a URL of an image of their face, and their name that looks something like the following:

Name	URL
John Smith	/officer1_2012-10-16.jpg
John Smith	/officer1_2014-07-04.jpg
Jane Doe	/officer2.jpg

When generating the officer list, it may be useful to have a per-department flag useFacialRecognition as proposed by @redshiftzero so that only officers in approved departments are added to the corpus of officer images to match from
A form where a user can upload their unknown officer image that reminds them not to upload officers from Illinois
To check whether that officer exists in the database, run through every face image in the existing officer corpus and use something like faceapi.detectSingleFace(unknownOfficerImage) and if nothing is detected, move on to the next image in the database.
It may be useful to somehow tag pictures that are stored on OpenOversight as useForFacialRecognition - not all uploaded photos include only the officer, and it may be worthwhile for users to hand-prune which images are used for facial recognition. This will also double as a safeguard so that all Illinois officers will automatically have useForFacialRecognition marked as false
If a match is found, and somehow, some way, it matches an officer that is located in Illinois, it should cancel the operation immediately and not display any result
Otherwise, display all matching officers to the user and allow them to either confirm the match and add the photo to the existing officer, or deny the match and create a new officer (or abandon the process)
If no matching officers are found, redirect the user to new officer creation page

Example for testing

A real life use case to test this on could be Oakland Police Department Deputy Chief Darren J. Allison https://openoversight.com/officer/23896

This officer has at least 2 high-quality photos where he is the only person in the photograph. Given an image of Darren J. Allison not in the OpenOversight database, this tool should be able to return his name as a suggested match. An example test image might be this one where the officer appears with multiple other people.

Open questions

What would be the best way of building the dynamic list of officer names + image URLs that the matching tool will run through?
What are other safeguards that can be put in place to ensure this is not run on any officer (or person) in the state of Illinois?
What are the pros/cons of using face-api.js vs Amazon Recoknition

McEileen commented 5 years ago

I am currently working on integrating AWS Rekognition's label detection feature into OpenOversight. I am writing this summary of my approach to make my work visible to other OpenOversight contributors, and also get feedback on my approach. Thoughts are appreciated!

Goal

In order to reduce the amount of work volunteers need to do, automate the step where volunteers classify a photo as containing police officers.
Classify all new photos that are uploaded as containing or not containing officers.
Classify all unsorted photos in the s3 bucket as containing or not containing officers.

Technical implementation

Classify all new photos that are uploaded as containing or not containing officers.

In the upload_image_to_s3_and_save_to_db method, the detect_officers method will run each photo through Rekognition.
- If Rekognition finds that the photo contains police officers, the contains_cops field is set to True.
- If Rekognition does not determine that the photo contains officers, no value is entered for contains_cops and a volunteer will later manually review this photo.
Detect_officers is called on police officers from all departments, including Chicago, because Rekognition's label detection feature does not violate BIPA.
All photos run through Rekognition have the sorted_by_rekognition field set to true.
The current implementation I wrote makes calls directly to Rekognition without using a queue. I am going to refactor this to place the task on a queue, either rq or AWS Simple Queue Service (SQS).
- I oppose using AWS Kinesis, because if a task on the queue fails, Kinesis will continually retry the task until the data expires, which can take up to seven days.
- If a task fails on SQS, it will retry three times before sending the task to a dead-letter queue.
- I am not familiar with rq's failure behavior and will research it before choosing a tool.
- Request for input: what are the trade-offs in using SQS or rq?

Classify all unsorted photos in the s3 bucket as containing or not containing officers.

I initially had considered using a lambda to run this task on a weekly basis. However, after running all existing unsorted photos through Rekognition one time, the detect_officers method in upload_image_to_s3_and_save_to_db would keep there from being any other photos in the database that needed to be sorted by Rekognition.
Instead, I decided to create a new endpoint, run_unsorted_stored_photos_through_Rekognition, that can only be accessed by admins.
If an admin hits the button associated with this endpoint, the code will query the raw_images table for all Images that don't have the sorted_by_rekognition field set to True and which do not have contains_cops set to True.
Next, the code will call detect_officers on every returned image, then update the Image's contains_cops and sorted_by_rekognition fields.

Areas I Still Need to Research

How much will using Rekognition cost?
How much would using AWS SQS cost?
What is rq's failure behavior?

Areas Where I'd Appreciate Feedback

Which queue is the best option for use in detect_officers? What are the tradeoffs of using rq or AWS SQS?
What are the potential downsides of making an admin-only endpoint for running stored images through Rekognition?

brucearctor commented 5 years ago

@McEileen ; not familiar with rq -- what you are proposing as a workflow sounds like a common pattern. S3 event triggers SQS to then run Rekognition. A given AWS Account gets 1 million SQS calls, so potentially costs nothing if not hitting that volume across applications - and very inexpensive afterwards. Rekognition is also pay-per-use ("server less"), rather than an always on, so about as reasonable pricing as can get.

rbavery commented 4 years ago

Hi all, I started looking for a project like this after seeing reports in Sacramento and elsewhere that police officers weren't wearing name tags and badge numbers to be identified. I want to help out with this feature if help is wanted. I've used AWS and Azure to manage computer vision/deep learning research projects using python, so I have some background in the infrastructure discussed as well as the software engineering/methods.

Is this currently being worked on?

McEileen commented 4 years ago

Hi @rbavery, thanks for reaching out! I had worked on this previously, then I paused in order to focus on deployment. If you want, I can share my work, and we can collaborate? Please let me know if you have any questions about the project or about Lucy Parsons Labs.

rbavery commented 4 years ago

Thanks for the info and availability to answer questions. That'd be great! My email is ravery@ucsb.edu for sharing previous work. As I understand it, the goals are the same as you identified earlier in the issue.

I saw this issue: https://github.com/lucyparsons/OpenOversight/issues/654 I think the use case @ejfox raises is compelling and valuable but that you also raise really valid concerns about legality. Has there been any updates on this potential use case of using face detection to match officers to database records and provide this information to users in real time (for departments where this is allowed)?

I'm also curious what the status of this PR is and how it relates to this issue: https://github.com/lucyparsons/OpenOversight/pull/517

I'm also curious about the larger goals of the project. The About page states that the site seeks to record (among other identifying info): "...mentions in news articles, salaries...." From browsing the database I haven't found instances of these (which is understandable, it sounds more difficult). I think this could be a valuable component of the project and could see, for example, protesters benefiting from being able to record and search for particular events/descriptions associated with particular officers. Is recording event/story info still a long term goal of the project?

McEileen commented 4 years ago

Thanks for taking the time to read through past issues and PRs related to this work.

I created a draft PR that shows work I did in 2019 for integrating both object detection and facial recognition into the project. In response to the concern about determining which departments allow facial recognition, I added a boolean field to the Department model, is_facial_recognition_allowed. Facial recognition would only be run on photos for departments where that is marked true. Additionally, users see a pop-up alert when they submit photos to a department that uses facial recognition. This is still a working solution to the legality concerns, and additional ideas are also welcome.

The above PR is very rough, particularly the facial recognition part. I had initially tried to incorporate face-api.js for Nodejs, but had difficulty getting it to work with the existing project set-up. I did get it to work for examples using face-api.js in the browser, but it's extremely slow, and the way that the photos are stored isn't sustainable. Instead of face-api.js, I think we should use a tool designed to work with Python-Flask.

I don't know if @redshiftzero is still working on PR #517. I can also make a separate PR for the Rekognition work I did in the linked draft PR.

Personally, I haven't focused on recording event or story info for OpenOversight. I think our limited capacity could make it difficult to add to the project. However, please let me know if you're interested in working on it.

rbavery commented 4 years ago

Got it thanks for getting the previous PR up. Good to hear that a Python-Flask approach is welcome, I've used this approach before for segmenting agriculture fields in satellite images: https://github.com/ecohydro/CropMask_RCNN/tree/master/app We might be able to use this as a template. I'm not familiar with Rekognition, but it sounds like it would replace a python-flask approach. From the wiki

SearchFaces enables users to import a database of images with pre-labeled faces, to train a machine learning model on this database, and to expose the model as a cloud service with an API. Then, the user can post new images to the API and receive information about the faces in the image. The API can be used to expose a number of capabilities, including identifying faces of known people, comparing faces, and finding similar faces in a database.

Using python-flask would probably allow us to not pay the Rekognition premium (and use a cheaper AWS instance) but might take more time to build. I'd definitely be interested in seeing the Rekognition work.

elimisteve commented 3 years ago

@redshiftzero This activist and programmer featured in the New York Times 2 days ago built his own neural net to perform facial recognition of police officers: https://www.nytimes.com/2020/10/21/technology/facial-recognition-police.html . I wonder if his code is somewhere on GitHub! And you may want to reach out to him 👍 .

elimisteve commented 3 years ago

Mr. Howell said his tool remained a work in progress and could recognize only about 20 percent of Portland’s police force. He hasn’t made it publicly available, but he said it had already helped a friend confirm an officer’s identity.

elimisteve commented 3 years ago

Jen herself is quoted in the article, and OpenOversight is mentioned, too! Pretty cool.

rbavery commented 3 years ago

@elimisteve @redshiftzero I'm currently working on something similar to Chris Howell with @porefluid (maker of NYPD MaksWatch https://twitter.com/nypdmaskwatch?lang=en).

We have code to run object detection models to separate police and non-police and a server set up to annotate photos to improve model accuracy. The models and code we use are based on the open source detectron2 project, no AWS Rekognition needed.

I think it'd be great to get in touch with Chris Howell to share code resources and training data, since we are currently working with a pretty small NYPD dataset and more training data can only help. Anyone know his github handle? Can't find it, there are a lot of Chris Howells out there.

My hope is that the model we are working on can be incorporated into OpenOversight in the future to make it easier to submit a photo that is processed to be civilian free and have the cop or cops cropped out into their own individual photos to be recorded in the database.

porefluid commented 3 years ago

Great idea @rbavery! I just reached out to try to find a lead on contacting the activist mentioned in the NYT article but I think it would be fantastic if we can share data. I'll check back in if/when I find a lead

lucyparsons / OpenOversight