A simple Android app that performs on-device face recognition by comparing FaceNet embeddings against a vector database of user-given faces
Clone the main
branch,
$> git clone --depth=1 https://github.com/shubham0204/OnDevice-Face-Recognition-Android
Perform a Gradle sync, and run the application.
The app provides two FaceNet models differing in the size of the embedding they provide. facenet.tflite
outputs a 128-dimensional embedding and facenet_512.tflite
a 512-dimensional embedding. In FaceNet.kt, you may change the model by modifying the path of the TFLite model,
// facenet
interpreter =
Interpreter(FileUtil.loadMappedFile(context, "facenet.tflite"), interpreterOptions)
// facenet-512
interpreter =
Interpreter(FileUtil.loadMappedFile(context, "facenet_512.tflite"), interpreterOptions)
For change embeddingDims
in the same file,
// facenet
private val embeddingDim = 128
// facenet-512
private val embeddingDim = 512
Then, in DataModels.kt, change the dimensions of the faceEmbedding
attribute,
@Entity
data class FaceImageRecord(
// primary-key of `FaceImageRecord`
@Id var recordID: Long = 0,
// personId is derived from `PersonRecord`
@Index var personID: Long = 0,
var personName: String = "",
// the FaceNet-512 model provides a 512-dimensional embedding
// the FaceNet model provides a 128-dimensional embedding
@HnswIndex(dimensions = 512)
var faceEmbedding: FloatArray = floatArrayOf()
)
We use the FaceNet model, which given a 160 * 160 cropped face image, produces an embedding of 128 or 512 elements capturing facial features that uniquely identify the face. We represent the embedding model as a function $M$ that accepts a cropped face image and returns a vector/embedding/list of FP numbers.
FaceDetector
to crop faces from the image. Each image is labelled with the person's name. See MLKitFaceDetector.kt
.FaceNet.kt
.FaceDetector
as in (1) and produce face embeddings for the face as in (2). We compare this face embedding (query vector) with those present in the vector database, and determines the name/label of the embedding (nearest-neighbor) closest to the query vector using cosine similarity.ImageVectorUseCase.kt
See issue #1
Face-liveness detection is the process of determining if the face captured in the camera frame is real or a spoof (photo, 3D model etc.). There are many techniques to perform face-liveness detection, the simplest ones being smile or wink detection. These are effective against static spoofs (pictures or 3D models) but do not hold for videos.
While exploring the deepface library, I discovered that it had implemented an anti-spoof detection system using the PyTorch models from Silent-Face-Anti-Spoofing repository. It uses the combination of two models that operate on two different scales of the same image. The model is penalized for classification-loss (cross-entropy loss) and the difference between the Fourier transform and the intermediate features from the CNN.
The models used by the deepface
library (same as in the Silent-Face-Anti-Spoofing
) are in the PyTorch format. The project already uses the TFLite runtime for executing the FaceNet model, and adding any other DL runtime would lead to unnecessary bloating of the application.
I converted the PT models to TFLite using this notebook: https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/resources/Liveness_PT_Model_to_TF.ipynb
FaceRecognition_With_FaceNet_Android
project?The FaceRecognition_With_FaceNet_Android is a similar project initiated in 2020 and re-iterated several times since then. Here are the key similarities and differences with this project: