Closed lfcnassif closed 6 months ago
Fortunately an internal ElasticSearch service of ours use the same algorithm for many million faces, we already have the feature vectors! Although there are algorithms with better accuracy, this will speed up things a lot.
We have 2 implementation approaches or possible use cases:
Opinions or other ideas?
I can help with the search service, as we are already using a similar approach in the SARD project. It would be better if we can have these vector, so I can ingest them in our opensearch engine and tweak it a little.
Thank you @hauck-jvsh! I'll try to get the vector database, so we can evaluate approach 1 without causing a DOS in a service owned by others and used for a different use case.
edited: It should have ~100 GB of vector data!
They are trying to expose a web api to be consumed instead of giving us the raw vector data set.
One demand I have is that IPED unfortunately does not search for faces in PDF files.
The vast majority of records generated by systems (SINESP, Boletim de Identificação Criminal, Boletim de Identificação Individual, or the famous "Puxar a Capivara") are saved in PDF format.
Oxygen Forensics Detective Softwares has a good face categorization system. I think it can provide good ideas.
One demand I have is that IPED unfortunately does not search for faces in PDF files.
Today we support images and videos. PDFs should be converted to images before, externally, or internally as a a new feature...
Oxygen Forensics Detective Softwares has a good face categorization system.
You mean group similar faces together? This is a different feature and could be implemented using a clustering algorithm.
You mean group similar faces together? This is a different feature and could be implemented using a clustering algorithm.
Yes, it would be an interesting resource if I could identify the face and maybe save it in a database, so I could use it in new cases.
One demand I have is that IPED unfortunately does not search for faces in PDF files.
Today we support images and videos. PDFs should be converted to images before, externally, or internally as a a new feature...
Wouldn't the current option to expand images from PDF's be enough?
I mean setting processImagesInPDFs = true
in ParsingTaskConfig.txt
and uncomment PDF Documents
in CategoriesToExpand.txt
, like it is done in "pedo" profile.
Wouldn't the current option to expand images from PDF's be enough? I mean setting
processImagesInPDFs = true
inParsingTaskConfig.txt
and uncommentPDF Documents
inCategoriesToExpand.txt
, like it is done in "pedo" profile.
no, I did several tests and it doesn't work with faces
Wouldn't the current option to expand images from PDF's be enough? I mean setting
processImagesInPDFs = true
inParsingTaskConfig.txt
and uncommentPDF Documents
inCategoriesToExpand.txt
, like it is done in "pedo" profile.no, I did several tests and it doesn't work with faces
Well, this is off topic here, but I just tested (using IPED 4.0.7) with a few PDFs containing face images, and face detection worked fine.
Maybe some configuration is missing / incorrect, or the face recognition algorithm is unable to detect faces in the images contained in your PDFs. First of all, were images extracted from the PDFs (as subitems)? Just to be clear, face detection will happen on the images extracted from the PDFs, not in the PDFs themselves. Please, check your environment / execution logs and open a new issue if necessary, providing the execution log and a sample PDF.
Wouldn't the current option to expand images from PDF's be enough? I mean setting
processImagesInPDFs = true
inParsingTaskConfig.txt
and uncommentPDF Documents
inCategoriesToExpand.txt
, like it is done in "pedo" profile.
Good idea @tc-wleite! This should work fine if configured correctly.
it is help? https://github.com/harvardnlp/image-extraction
it is help? https://github.com/harvardnlp/image-extraction
No, thanks, we already have code for this.
uncomment
PDF Documents
inCategoriesToExpand.txt
After performing this configuration it worked perfectly, sorry for my lack of understanding.
Since the original case that motivated this feature is being handled in a different way and since this feature would be coupled with an internal system, I'm closing this. It may be reopened in the future if the need arises again.
This could easy suspect identification, potentially useful for recent crimes against democracy in Brazil and other Investigations. Maybe our face encoding algorithm or the one used in the face database to be used should be changed, they should match. Maybe the remote face database indexing algorithm should be changed if it exists and if query time becomes a bottleneck.