justadudewhohacks / face-api.js

JavaScript API for face detection and face recognition in the browser and nodejs with tensorflow.js
MIT License
16.58k stars 3.69k forks source link

Difference between running the algorithm on browser and nodeJS #771

Open mayankagarwal-cf opened 3 years ago

mayankagarwal-cf commented 3 years ago

Node Version: 14.16.0 Face-api.js : master cloned Chrome version : 89.0.4389.90 (Official Build) (64-bit)

I was running tests to see the difference between how the algorithm performs on the two platforms. For some reason, nodeJS is able to find images in the same image in which browser fails to find. An example of this would be : George_W_Bush_0137.jpg from the Labelled faces in the wild dataset.

Shouldn't the encoding algorithm be platform agnostic. Why exactly does this happen?

Standalone code to encode on browser:

<!DOCTYPE html>
<html>
<head>
  <script src="../../../dist/face-api.js"></script>
  <script src="../public/js/commons.js"></script>
  <script src="../public/js/faceDetectionControls.js"></script>
  <!-- <link rel="stylesheet" href="styles.css"> -->
  <!-- <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.100.2/css/materialize.css"> -->
  <script type="text/javascript" src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
  <!-- <script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.100.2/js/materialize.min.js"></script> -->
</head>
<body>

  <input type="file" id="file" name="file"/><br><br>

    <button id = "run" onclick="run()">Run</button><br><br>

  <script>

async function run(){

    await faceapi.loadSsdMobilenetv1Model('../../../weights')
    await faceapi.loadFaceLandmarkModel('../../../weights')
    await faceapi.loadFaceRecognitionModel('../../../weights')

    const inputElement = document.getElementById("file");

    console.log(inputElement.files[0]);

    const img = await faceapi.bufferToImage(inputElement.files[0]);

    const encoding = await calculateEncoding(img);

    console.log(encoding);

}

async function calculateEncoding(img){

      //We want to detect the best face in each file
      const result = await faceapi.detectSingleFace(img, getFaceDetectorOptions())
      .withFaceLandmarks()
      .withFaceDescriptor()

      //only if face found
      if(result != undefined)
          return result.descriptor;
      else 
        return undefined;

}

  </script>
</body>
</html>

NodeJS:

//encodes one image, path of which is supplied

const fsPromises = require('fs/promises');
const path = require('path');
const process = require('process');

const faceapi = require('face-api.js')
const { canvas, faceDetectionNet, faceDetectionOptions } = require('./commons');

async function run(){

    //load models
    await faceDetectionNet.loadFromDisk('../../weights')
    await faceapi.nets.faceLandmark68Net.loadFromDisk('../../weights')
    await faceapi.nets.faceRecognitionNet.loadFromDisk('../../weights')

    const encoding = await calculateEncoding(process.argv[2]);

    console.log(encoding);

}

async function calculateEncoding(imagePath){

    const img = await canvas.loadImage(imagePath);

    //We want to detect the best face in each file
    const result = await faceapi.detectSingleFace(img, faceDetectionOptions)
    .withFaceLandmarks()
    .withFaceDescriptor()

    //only if face found
    if(result != undefined)
        return result.descriptor;
    else 
        return undefined;

}

run();
vladmandic commented 3 years ago

what are actual detect options? in browser case, you're calling undocumented function getFaceDetectorOptions and in node case, you're importing from commons - neither is known.

also, why the difference in loading (plus mix&match methods)? use faceapi.nets.ssdMobilenetv1.load and faceapi.nets.ssdMobilenetv1.loadFromDisk

mayankagarwal-cf commented 3 years ago

I've been using the examples provided. They point to the same code, but to make sure that wasn't the reason, here are the updated parts

nodeJS:

    await faceapi.nets.ssdMobilenetv1.loadFromDisk('../../weights')
    await faceapi.nets.faceLandmark68Net.loadFromDisk('../../weights')
    await faceapi.nets.faceRecognitionNet.loadFromDisk('../../weights')
-----

    //We want to detect the best face in each file
    const result = await faceapi.detectSingleFace(img, new faceapi.SsdMobilenetv1Options({ minConfidence }))
    .withFaceLandmarks()
    .withFaceDescriptor()

Browser:

    await faceapi.nets.ssdMobilenetv1.load('../../../weights')
    await faceapi.nets.faceLandmark68Net.load('../../../weights')
    await faceapi.nets.faceRecognitionNet.load('../../../weights')

    //We want to detect the best face in each file
    const result = await faceapi.detectSingleFace(img, new faceapi.SsdMobilenetv1Options({ minConfidence }))
    .withFaceLandmarks()
    .withFaceDescriptor()

The results are still the same.

vladmandic commented 3 years ago

This made me really curious...

I wrote a quick test and moved image loading and conversion to tensor out of FaceAPI to eliminate that part

And yes, Browser and NodeJS return different values
But in Browser, both WebGL and WASM return same values

So it's not a image loading and not backend specific
And it's not a different model as FaceAPI makes no distinction when running inference on browser vs node

So what's different? This is just a guess, but tf.image.resizeBilinear has different defaults in TFJS and TF
For example, property alignCorners is not specified by FaceAPI, so it's interpreted as false in TFJS and true in TF
(and there are probably more instances of similar cases)

It makes sense as when you look at result outputs, face bounding box is slightly shifted (by about half a pixel) And that's enough to create a small cascading differences in descriptor and elsewhere

Now, if there is an actual reason to "fix" this,
I can do that in my newer fork https://github.com/vladmandic/face-api
Otherwise, this will stay as curiosity

mayankagarwal-cf commented 3 years ago

Yes, has me baffled for a couple of days.

I tried that. The flow doesn't even go in the conditional loop for the example image mentioned.

          if (imgTensor.shape[1] !== inputSize || imgTensor.shape[2] !== inputSize) {
            imgTensor = tf.image.resizeBilinear(imgTensor, [inputSize, inputSize], true);
            console.log("resizeBilinear alignCorners arg changed to true");
          }

But do check it on your end as I'm not familiar with the codebase and there might be something that I'm missing.

Don't you think fixing this or at least finding the root cause would be beneficial as it looks like the browser is underperforming. However, I've only found two instances (yet) so concluding that browser always underperforms might be naive. Yes, I did look at your fork. Kudos on the work!. Should I open an issue in your fork, so the problem can be resolved there.

vladmandic commented 3 years ago

yes, i was wrong. but at least i managed to find it at the end :)

seems its an actual bug in implementation of tf.conv2d

actual function in FaceAPI is pointwiseConvLayer which is literary first operation performed in mobileNetV1 model implementation - so this tiny change cascades down to eventually create a difference you're seeing.

see https://github.com/tensorflow/tfjs/issues/4843 for details

there is one more difference between browser and nodejs and that's the initial jpg decoding which shifts pixel values by one (not coordinate offset, but rgb values itself). so simply adding tf.add(input, 1) to nodejs decode workflow solves it.

vladmandic commented 3 years ago

@mayankagarwal-cf

just closing the loop...

it seems that issue was conv2d implementation in old TF1 binaries which were used by tfjs-node
tfjs-node 3.5.0 finally ships with TF2 binaries and this issue is resolved