justadudewhohacks / face-api.js

JavaScript API for face detection and face recognition in the browser and nodejs with tensorflow.js
MIT License
16.64k stars 3.7k forks source link

Improve speed #32

Closed MaciejWWojcik closed 6 years ago

MaciejWWojcik commented 6 years ago

Hello, in the beginning, I would like to say that this library is beautiful, and I'm really impressed by how good it works. I mean how precise.

I have an issue with the speed of face and landmark detection. I feel that with this quality of precision it's hard to make it faster, but I'm really interested in slightly better speed of detecting.

I'm using this library with the stream from my camera, and I get very low results (sth around 1 FPS in face detection)

So Is there any known factor or something that I could change which will increase the speed of face detecting? I'm aware that the precision probably will decrease but I don't care much about this.

This is my code for detecting face landmarks using camera stream: That's my code for drawing landmarks on camera stream

<html>
<head>
    <script src="face-api.js"></script>
    <script src="commons.js"></script>
    <link rel="stylesheet" href="styles.css">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.100.2/css/materialize.css">
    <script type="text/javascript" src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.100.2/js/materialize.min.js"></script>
</head>
<body>

<video id="video" autoplay width="480" height="360"></video>
<canvas id="overlay" width="480" height="360"></canvas>

<script>
    const video = document.getElementById('video');
    const canvas = document.getElementById('overlay')
    canvas.width = 480;
    canvas.height = 360;

    navigator.mediaDevices.getUserMedia({audio: true, video: true}).then(
        stream => {
            video.srcObject = stream;
            video.play();
            video.muted = true;
            run()
        }
    ).catch(e => console.warn(e));

    const minConfidence = 0.6
    const maxResults = 1

    async function run() {
        await faceapi.loadFaceLandmarkModel('/');
        await faceapi.loadFaceDetectionModel('/');
        requestAnimationFrame(processFrame)
    }

    async function processFrame() {

        const detections = await faceapi.locateFaces(video, minConfidence, maxResults)
        const faceTensors = await faceapi.extractFaceTensors(video, detections)
        let landmarksByFace = await Promise.all(faceTensors.map(t => faceapi.detectLandmarks(t)))
        faceTensors.forEach(t => t.dispose())

        if(landmarksByFace.length > 0){
            landmarksByFace = landmarksByFace.map((landmarks, i) => {
                const box = detections[i].forSize(480, 360).getBox()
                return landmarks.forSize(box.width, box.height).shift(box.x, box.y)
            })
            canvas.getContext('2d').clearRect(0, 0, canvas.width, canvas.height)
            faceapi.drawLandmarks(canvas, landmarksByFace[0], {drawLines: true})
        }
        requestAnimationFrame(processFrame)
    }
</script>

</body>
</html>

and this code is only for detecting face (1 FPS on chrome)

  navigator.mediaDevices.getUserMedia({audio: true, video: true}).then(
        stream => {
            video.srcObject = stream;
            video.play();
            video.muted = true;
            run()
        }
    ).catch(e => console.warn(e));

    const minConfidence = 0.6
    const maxResults = 1

    async function run() {
        await faceapi.loadFaceLandmarkModel('/');
        await faceapi.loadFaceDetectionModel('/');
        requestAnimationFrame(processFrame)
    }

    async function processFrame() {
        const detections = await faceapi.locateFaces(video, minConfidence, maxResults)
        const detectionsForSize = detections.map(det => det.forSize(480, 360)
        canvas.getContext('2d').clearRect(0, 0, canvas.width, canvas.height)
        faceapi.drawDetection(canvas, detectionsForSize)
        requestAnimationFrame(processFrame)
    }
justadudewhohacks commented 6 years ago

Hi,

I know, face detection is currently the bottleneck. Probably the main reason for that is that the net works on 512x512 sized images. I was planning on providing an option to make it alternatively run on 256x256 sized images, first measurements turned out to provide a speedup of 4x - 6x, which atleast on my machine was achieving realtime speed.

However, this change requires to train an additional model on top of the current model. I am currently working on an api to make the nets conveniently trainable, such that I can hopefully refine some of the models.

Apart from that, looking at your code I assume you are trying to get the face landmark positions of a users face from a webcam? Probably it is also possible to pass the frames directly into the landmark net if you only have a single face shown, but you would probably have to play some tricks to get a rough estimation of the bounding box, or atleast crop the frame to a square. Not sure if that's gonna work out sufficiently well though.

Atleast the forwarding time of the face landmark net should be much faster. On my machine it's about 60 - 80 fps, even on my shitty laptop it runs at about 30fps.

MaciejWWojcik commented 6 years ago

Super, so what should I do to pass frames directly into the landmark net? It will be very good if I could reach 30 FPS 👍

justadudewhohacks commented 6 years ago

I just released v0.8.0, which implements MTCNN as an alternative face detector. I have played around with live webcam detection on my laptop a bit and I can achieve about realtime performance with MTCNN, in case you want to give it a try. You can also find a MTCNN webcam example here.

MaciejWWojcik commented 6 years ago

Wow, it's really better than my previous experiments 👍 It's a pity, that the precision isn't so good, but fps are great. Well, It's not ideal face tracking, but I see a big progress. Looking forward to hearing about another version with improvements. Kind regards

adriaciurana commented 6 years ago

If you only want to detect a single face (webcam), I suggest you try to create your own mobilenet that has as output:     - face: sigmoid [classificacion]     - bounding-box: 4values, linear [regression]

You can use a loss combined with the detection of the face by using a binary_crossentropy and bounding-box using mean_square_error.

If not, you can opt for already existing methods like: viola & jones or this (it is an algorithm similar to viola & jones but replaces the haar features with a comparison between two pixels): https://github.com/tehnokv/picojs

Keep in mind that you will lose accuracy, but you will earn a lot of FPS.

justadudewhohacks commented 6 years ago

The problem is, that mobilenet itself is already slower than mtcnn, atleast from what I measured from PC, laptop and mobile.

But it's probably faster than SSD, if you only have to predict a single bounding box, sure.

justadudewhohacks commented 6 years ago

I will soon publish a new face detector (tiny yolo v2, using depthwise separable convolutions). This one is much faster than MTCNN on mobile (I can get processing times around ~100ms / 4 - 5 fps on mobile android) and seems to be about as fast as MTCNN on my desktop and laptop, but it's much more stable.

You can already try it out under the link I posted here: #52. I will close this issue here.

veetechh commented 5 years ago

Is there any way to improve speed and stabilization ?

Trying 0.15 version, webfacetracking example with tiny face detector, 128 input size.

Rout0710 commented 2 years ago

I'm using other ML models with fcae.api, the video in my webCam is lagging a bit when I add face.api in my project, any way to smoothen the video that the user see's in their laptop? Thanks!

Icoder9699 commented 1 year ago

Hi everyone! I decremented the option inputSize in TinyFaceDetectorOptions. It helps me :)

chaitanya71998 commented 10 months ago

@Icoder9699 its better to go with ssdMobile detector, and square images, to get fast and easy detection. I took image size 512x512 for desktops and 256x256 for mobile. let me know how it works for you.