Query for faces in a remote service to enrich analysis with person IDs

sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.

Other

945 stars 218 forks source link

Query for faces in a remote service to enrich analysis with person IDs #1472

Closed lfcnassif closed 6 months ago

lfcnassif commented 1 year ago

This could easy suspect identification, potentially useful for recent crimes against democracy in Brazil and other Investigations. Maybe our face encoding algorithm or the one used in the face database to be used should be changed, they should match. Maybe the remote face database indexing algorithm should be changed if it exists and if query time becomes a bottleneck.

lfcnassif commented 1 year ago

Fortunately an internal ElasticSearch service of ours use the same algorithm for many million faces, we already have the feature vectors! Although there are algorithms with better accuracy, this will speed up things a lot.

We have 2 implementation approaches or possible use cases:

Execute a batch search while processing for all faces found in the seized evidences, not sure if the service will handle the query throughput. This would allow to index person IDs, names and other details and allow a later UI search for them.
Make an UI feature to search for details of a selected face on demand, so we can search just for face details involved in a specific criminal activity.

Opinions or other ideas?

hauck-jvsh commented 1 year ago

I can help with the search service, as we are already using a similar approach in the SARD project. It would be better if we can have these vector, so I can ingest them in our opensearch engine and tweak it a little.

lfcnassif commented 1 year ago

Thank you @hauck-jvsh! I'll try to get the vector database, so we can evaluate approach 1 without causing a DOS in a service owned by others and used for a different use case.

lfcnassif commented 1 year ago

edited: It should have ~100 GB of vector data!

lfcnassif commented 1 year ago

They are trying to expose a web api to be consumed instead of giving us the raw vector data set.

InvBrunoPCSC commented 1 year ago

One demand I have is that IPED unfortunately does not search for faces in PDF files.

The vast majority of records generated by systems (SINESP, Boletim de Identificação Criminal, Boletim de Identificação Individual, or the famous "Puxar a Capivara") are saved in PDF format.

Oxygen Forensics Detective Softwares has a good face categorization system. I think it can provide good ideas.

lfcnassif commented 1 year ago

One demand I have is that IPED unfortunately does not search for faces in PDF files.

Today we support images and videos. PDFs should be converted to images before, externally, or internally as a a new feature...

Oxygen Forensics Detective Softwares has a good face categorization system.

You mean group similar faces together? This is a different feature and could be implemented using a clustering algorithm.

InvBrunoPCSC commented 1 year ago

You mean group similar faces together? This is a different feature and could be implemented using a clustering algorithm.

Yes, it would be an interesting resource if I could identify the face and maybe save it in a database, so I could use it in new cases.

wladimirleite commented 1 year ago

One demand I have is that IPED unfortunately does not search for faces in PDF files.

Today we support images and videos. PDFs should be converted to images before, externally, or internally as a a new feature...

Wouldn't the current option to expand images from PDF's be enough? I mean setting processImagesInPDFs = true in ParsingTaskConfig.txt and uncomment PDF Documents in CategoriesToExpand.txt, like it is done in "pedo" profile.

InvBrunoPCSC commented 1 year ago

Wouldn't the current option to expand images from PDF's be enough? I mean setting processImagesInPDFs = true in ParsingTaskConfig.txt and uncomment PDF Documents in CategoriesToExpand.txt, like it is done in "pedo" profile.

no, I did several tests and it doesn't work with faces

wladimirleite commented 1 year ago

Wouldn't the current option to expand images from PDF's be enough? I mean setting processImagesInPDFs = true in ParsingTaskConfig.txt and uncomment PDF Documents in CategoriesToExpand.txt, like it is done in "pedo" profile.

no, I did several tests and it doesn't work with faces

Well, this is off topic here, but I just tested (using IPED 4.0.7) with a few PDFs containing face images, and face detection worked fine.

Maybe some configuration is missing / incorrect, or the face recognition algorithm is unable to detect faces in the images contained in your PDFs. First of all, were images extracted from the PDFs (as subitems)? Just to be clear, face detection will happen on the images extracted from the PDFs, not in the PDFs themselves. Please, check your environment / execution logs and open a new issue if necessary, providing the execution log and a sample PDF.

lfcnassif commented 1 year ago

Wouldn't the current option to expand images from PDF's be enough? I mean setting processImagesInPDFs = true in ParsingTaskConfig.txt and uncomment PDF Documents in CategoriesToExpand.txt, like it is done in "pedo" profile.

Good idea @tc-wleite! This should work fine if configured correctly.

paulobreim commented 1 year ago

it is help? https://github.com/harvardnlp/image-extraction

lfcnassif commented 1 year ago

it is help? https://github.com/harvardnlp/image-extraction

No, thanks, we already have code for this.

InvBrunoPCSC commented 1 year ago

uncomment PDF Documents in CategoriesToExpand.txt

After performing this configuration it worked perfectly, sorry for my lack of understanding.

lfcnassif commented 6 months ago

Since the original case that motivated this feature is being handled in a different way and since this feature would be coupled with an internal system, I'm closing this. It may be reopened in the future if the need arises again.