hukkelas / DSFD-Pytorch-Inference

A High-Performance Pytorch Implementation of face detection models, including RetinaFace and DSFD
Apache License 2.0
218 stars 58 forks source link

State of the Art Face Detection in Pytorch with DSFD and RetinaFace

This repository includes:

NOTE This implementation can only be used for inference of a selection of models and all training scripts are removed. If you want to finetune any models, we recommend you to use the original source code.


You can install this repository with pip (requires python>=3.6);

pip install git+

You can also install with the

python3 install

Getting started



This will look for images in the images/ folder, and save the results in the same folder with an ending _out.jpg

Simple API

To perform detection you can simple use the following lines:

import cv2
import face_detection
detector = face_detection.build_detector(
  "DSFDDetector", confidence_threshold=.5, nms_iou_threshold=.3)
# BGR to RGB
im = cv2.imread("path_to_im.jpg")[:, :, ::-1]

detections = detector.detect(im)

This will return a tensor with shape [N, 5], where N is number of faces and the five elements are [xmin, ymin, xmax, ymax, detection_confidence]

Batched inference

import numpy as np
import face_detection
detector = face_detection.build_detector(
  "DSFDDetector", confidence_threshold=.5, nms_iou_threshold=.3)
# [batch size, height, width, 3]
images_dummy = np.zeros((2, 512, 512, 3))

detections = detector.batched_detect(im)


Difference from DSFD

For the original source code, see here.

The main improvements in inference time comes from:

Difference from RetinaFace

For the original source code, see here.

We've done the following improvements:

Inference time

This is very roughly estimated on a 1024x687 image. The reported time is the average over 1000 forward passes on a single image. (With no cudnn benchmarking and no fp16 computation).

DSFDDetector RetinaNetResNet50 RetinaNetMobileNetV1
CPU (Intel 2.2GHz i7) * 17,496 ms (0.06 FPS) 2970ms (0.33 FPS) 270ms (3.7 FPS)
NVIDIA V100-32GB 100ms (10 FPS)
NVIDIA GTX 1060 6GB 341ms (2.9 FPS) 76.6ms (13 FPS) 48.2ms (20.7 FPS)
NVIDIA T4 16 GB 482 ms (2.1 FPS) 181ms (5.5 FPS) 178ms (5.6 FPS)

*Done over 100 forward passes on a MacOS Mid 2014, 15-Inch.


TensorRT Inference (Experimental)

You can run RetinaFace ResNet-50 with TensorRT:

from face_detection.retinaface.tensorrt_wrap import TensorRTRetinaFace

inference_imshape =(480, 640) # Input to the CNN
input_imshape = (1080, 1920) # Input for original video source
detector = TensorRTRetinaFace(input_imshape, imshape)
boxes, landmarks, scores = detector.infer(image)


If you find this code useful, remember to cite the original authors:

  title={DSFD: Dual Shot Face Detector},
  author={Li, Jian and Wang, Yabiao and Wang, Changan and Tai, Ying and Qian, Jianjun and Yang, Jian and Wang, Chengjie and Li, Jilin and Huang, Feiyue},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},

  title={RetinaFace: Single-stage Dense Face Localisation in the Wild},
  author={Deng, Jiankang and Guo, Jia and Yuxiang, Zhou and Jinke Yu and Irene Kotsia and Zafeiriou, Stefanos},