takuya-takeuchi / FaceRecognitionDotNet

The world's simplest facial recognition api for .NET on Windows, MacOS and Linux
MIT License
1.24k stars 304 forks source link

FaceRecognition.FaceEncodings() Too slow #4

Closed yuzifu closed 6 years ago

yuzifu commented 6 years ago

I create a wpf demo for face compare, the code is as following:

        private void StartVideo()
        {
            tVideo = new Thread(new ThreadStart(() =>
            {
                using (var cap = new VideoCapture(0))
                {
                    if (!cap.IsOpened())
                        return;

                    OpenCvSharp.Point pLeftTop = new OpenCvSharp.Point();
                    OpenCvSharp.Point pRightBottom = new OpenCvSharp.Point();
                    int roiWidth = 260;
                    int roiHeight = 360;

                    Mat target = Cv2.ImRead("yuzifu.jpg");
                    var arrtarget = new byte[target.Width * target.Height * target.ElemSize()];
                    Marshal.Copy(target.Data, arrtarget, 0, arrtarget.Length);
                    var temptarget = Dlib.LoadImageData<RgbPixel>(arrtarget, (uint)target.Height, (uint)target.Width, (uint)(target.Width * target.ElemSize()));
                    var imgtarget = FaceRecognition.LoadImageData(temptarget);
                    var endodings = this._FaceRecognition.FaceEncodings(imgtarget).ToArray();

                    cap.FrameWidth = 800;
                    cap.FrameHeight = 600;
                    cap.FourCC = "MJPG";
                    using (Mat srcImg = new Mat())
                    {
                        while (running)
                        {
                            cap.Read(srcImg);
                            if (srcImg.Width > 0 && srcImg.Height > 0)
                            {
                                Cv2.Flip(srcImg, srcImg, FlipMode.Y);

                                pLeftTop.X = (srcImg.Width - roiWidth) / 2;
                                pLeftTop.Y = (srcImg.Height - roiHeight) / 2;
                                pRightBottom.X = pLeftTop.X + roiWidth;
                                pRightBottom.Y = pLeftTop.Y + roiHeight;
                                OpenCvSharp.Rect roi = new OpenCvSharp.Rect(pLeftTop.X, pLeftTop.Y, roiWidth, roiHeight);
                                Mat imageROI = srcImg.Clone(roi);

                                bool result = CompareFace(imageROI, endodings);

                                srcImg.Rectangle(pLeftTop, pRightBottom, Scalar.FromRgb(0, 255, 150));
                                Dispatcher.Invoke(()=>
                                {
                                    if (!srcImg.IsDisposed)
                                        FrontVideo.Source = srcImg.ToBitmapSource();
                                    Name.Text = result ? "Yuzifu" : "Unknow";
                                });
                            }
                        }
                    }
                }
            }));
            tVideo.Start();
        }

        private bool CompareFace(Mat source, FaceEncoding[] endtarget)
        {
            bool rtn = false;
            var arrsource = new byte[source.Width * source.Height * source.ElemSize()];
            Marshal.Copy(source.Data, arrsource, 0, arrsource.Length);
            var tempsource = Dlib.LoadImageData<RgbPixel>(arrsource, (uint)source.Height, (uint)source.Width, (uint)(source.Width * source.ElemSize()));

            using (var imgsource = FaceRecognition.LoadImageData(tempsource))
            {
                var locals = this._FaceRecognition.FaceLocations(imgsource);

                // too slow
                var endodings1 = this._FaceRecognition.FaceEncodings(imgsource, locals).ToArray();

                foreach (var encoding in endodings1)
                    foreach (var compareFace in FaceRecognition.CompareFaces(endtarget, encoding))
                    {
                        if (compareFace)
                        {
                            rtn = true;
                            break;
                        }
                    }

                foreach (var encoding in endodings1)
                    encoding.Dispose();
            }

            return rtn;
        }

        // Add func to FaceRecognition
        public static Image LoadImageData(Array2D<RgbPixel> array)
        {
            return new Image(new Matrix<RgbPixel>(array));
        }

By tracking the running time, execute FaceEncodings() need a lot of time, at least 600 milliseconds, many times more than 1 second.

takuya-takeuchi commented 6 years ago

It's weird. I tested lenna.jpg (512x480) and it resulted the following performance.

Mode Total Average
GPU 3656 ms 36 ms
CPU 6776 ms 67 ms

By the way, do you dispose tempsource instance? If not, it may occur memory leak. It may cause performance issue due to lack of memory.

yuzifu commented 6 years ago

Thank you for your reply. I try to dispose tempsource, but performance has not improved. I try to compare lenna.jpg, the code as following:

        private void lennatest()
        {
            Mat target = Cv2.ImRead("lenna_small.jpg"); // 481x512 cut from lenna.jpg
            var arrtarget = new byte[target.Width * target.Height * target.ElemSize()];
            Marshal.Copy(target.Data, arrtarget, 0, arrtarget.Length);
            using (var temptarget = Dlib.LoadImageData<RgbPixel>(arrtarget, (uint)target.Height, (uint)target.Width, (uint)(target.Width * target.ElemSize())))
            {
                var imgtarget = FaceRecognition.LoadImageData(temptarget);
                var endodings = this._FaceRecognition.FaceEncodings(imgtarget).ToArray();

                Mat source = Cv2.ImRead("lenna.jpg"); //512x512

                int cnt = 0;
                long total = 0;
                long max = 0;
                while (cnt++ < 100)
                {
                    System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
                    watch.Start();
                    bool result = CompareFace(source, endodings);
                    watch.Stop();
                    total += watch.ElapsedMilliseconds;
                    if (watch.ElapsedMilliseconds > max)
                        max = watch.ElapsedMilliseconds;
                }

                Name.Text = max.ToString();
                Second.Text = (total / 100.0).ToString();
            }
        }

        private bool CompareFace(Mat source, FaceEncoding[] endtarget)
        {
            bool rtn = false;
            var arrsource = new byte[source.Width * source.Height * source.ElemSize()];
            Marshal.Copy(source.Data, arrsource, 0, arrsource.Length);
            using (var tempsource = Dlib.LoadImageData<RgbPixel>(arrsource, (uint)source.Height, (uint)source.Width, (uint)(source.Width * source.ElemSize())))
            {
                using (var imgsource = FaceRecognition.LoadImageData(tempsource))
                {
                    var locals = this._FaceRecognition.FaceLocations(imgsource);
                    var endodings1 = this._FaceRecognition.FaceEncodings(imgsource, locals).ToArray();

                    foreach (var encoding in endodings1)
                        foreach (var compareFace in FaceRecognition.CompareFaces(endtarget, encoding))
                        {
                            if (compareFace)
                            {
                                rtn = true;
                                break;
                            }
                        }

                    foreach (var encoding in endodings1)
                        encoding.Dispose();
                }
            }

            return rtn;
        }

The running time between 30-40ms, but it is not really comparing, because FaceRecognition .FaceLocations() returns length is 0.

takuya-takeuchi commented 6 years ago

Thank you for giving test code. Plz give time to check and reproduce.

yuzifu commented 6 years ago

I found that the Native.loss_metric_operator_matrixs() used a lot of time.

takuya-takeuchi commented 6 years ago

If you use nugget, could you tell me FaceRecognition version you use? I always test latest code. So this issue may be happened in a certain version.

Today, I can work on this issue because I have nothing to do. Therefore, I need more clues.

yuzifu commented 6 years ago

I am using the source code cloned from the repository on September 8, 2018, include DlibDotNet.Native.

yuzifu commented 6 years ago

I am trying to add FaceRecognitionDotNet and DlibDotNet from nugget, it's show error info "Failed to add reference to 'DlibDotNet.Native'" when I add DlibDotNet, Do I need to compile DlibDotNet.Native from source using CMake?

takuya-takeuchi commented 6 years ago

yeah, I know this issue. Sorry for trouble you.

yuzifu commented 6 years ago

You are welcome, thank you very much for your work.

takuya-takeuchi commented 6 years ago

I published new FaceRecognitionDotNet package 1.2.3.2. It works fine.

I was able to reproduce, Yes, library cannot detect face from lenna,jpg. So FaceEncodings returns 0 length array. However, it does not mean FaceRecognitionDotNet does not work.

The below image works fine. https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/President_Barack_Obama.jpg/512px-President_Barack_Obama.jpg

So I must check the original face_recognition source.

takuya-takeuchi commented 6 years ago

I checked face_recognition on Ubuntu 16.04.4 by using python. In conclusion, lenna image was not detected face.

image

yuzifu commented 6 years ago

It can't detect that the face is a problem, in my business logic, if it does not detect face, logic will be stop.

I used latest code to test again, the conclusion is that if the image does not detect the face, FaceEncodings() speed will be very fast; if the image detects the face, FaceEncodings() speed will be very slow.

takuya-takeuchi commented 6 years ago

I cannot conclude but performance issue may be occured by dlib. So I will prepare real Linux machine and measure performance of face_recognition.

yuzifu commented 6 years ago

I create a console demo, DlibDotNet 19.15.0.20180911 and FaceRecognitionDotNet 1.2.3.2 install from nuget.org, test code as following:

            long total = 0;
            long max = 0;
            public void EncodingTest()
            {
                var imgtarget = FaceRecognition.LoadImageFile("512px-President_Barack_Obama.jpg");
                int cnt = 0;

                while (cnt++ < 100)
                {
                    var lo = _FaceRecognition.FaceLocations(imgtarget);
                    System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
                    watch.Start();
                    var tmp = _FaceRecognition.FaceEncodings(imgtarget).ToArray();
                    watch.Stop();
                    total += watch.ElapsedMilliseconds;
                    if (watch.ElapsedMilliseconds > max)
                        max = watch.ElapsedMilliseconds;
                }

                double avg = total / 100.0;
                Console.Write(string.Format("Max: {0}, Avg: {1}", max, avg));
                Console.ReadLine();
            }

test image https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/President_Barack_Obama.jpg/512px-President_Barack_Obama.jpg, it can detect two face location, the output is:

Max: 1378, Avg: 1015.57

test image https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png, it can't detect face, the output is:

Max: 166, Avg: 113.19

yuzifu commented 6 years ago

In macOS 10.13.6, I checked the above two images using face_recognition, the output is as follows: 2018-09-13 17 53 58

yuzifu commented 6 years ago

FaceRecognitionDotNet detected two faces in 512px-President_BarackObama.jpg, And they are repeated. FaceRecognitionDotNet does not detect faces in Lenna(test_image).png.

test code is as following:

public void EncodingTest(string file)
            {
                var imgtarget = FaceRecognition.LoadImageFile(file);
                int cnt = 0;
                long total = 0;
                long max = 0;

                var lo = _FaceRecognition.FaceLocations(imgtarget);
                foreach (var i in lo)
                    Console.WriteLine(string.Format("{0}: {1},{2},{3},{4}", file, i.Left, i.Top, i.Right, i.Bottom));

                while (cnt++ < 100)
                {
                    System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
                    watch.Start();
                    var tmp = _FaceRecognition.FaceEncodings(imgtarget).ToArray();
                    watch.Stop();
                    total += watch.ElapsedMilliseconds;
                    if (watch.ElapsedMilliseconds > max)
                        max = watch.ElapsedMilliseconds;
                }

                double avg = total / 100.0;
                Console.WriteLine(string.Format("{0}: Max - {1}, Avg - {2}", file, max, avg));
            }

The output is as following:

512px-President_Barack_Obama.jpg: 189,79,314,203 512px-President_Barack_Obama.jpg: 189,79,314,203 512px-President_BarackObama.jpg: Max - 2077, Avg - 1205.05 Lenna(test_image).png: Max - 241, Avg - 202.45

takuya-takeuchi commented 6 years ago

I reproduced on MacOS X by python.

$ python3 face_detection_cli.py ~/Work/tmp/face_recognition/
/Users/spitzbergen/Work/tmp/face_recognition/512px-President_Barack_Obama.jpg,79,314,203,189
/Users/spitzbergen/Work/tmp/face_recognition/Lenna.png,228,377,377,228

And Windows by python

(D:\Works\Python\Envs\face_recognition) d:\Works\Local\face_recognition\face_recognition>face_detection .
.\512px-President_Barack_Obama.jpg,79,314,203,189
.\Lenna.png,228,377,377,228

FaceRecognition or DlibDotNet have something wrong. If correct this wrong, performance may be improved?

yuzifu commented 6 years ago

The results of FaceLocations() different face_locations(), so maybe need to fix FaceLocations bug before improve performance.

takuya-takeuchi commented 6 years ago

NOTE

benchmark.py on Windows

(D:\Works\Python\Envs\face_recognition) D:\Works\Local\face_recognition\examples>python benchmark.py
Benchmarks (Note: All benchmarks are only using a single CPU core)

Timings at 240p:
 - Face locations: 0.0518s (19.32 fps)
 - Face landmarks: 0.0022s (461.01 fps)
 - Encode face (inc. landmarks): 0.0232s (43.18 fps)
 - End-to-end: 0.0802s (12.46 fps)

Timings at 480p:
 - Face locations: 0.2024s (4.94 fps)
 - Face landmarks: 0.0022s (451.43 fps)
 - Encode face (inc. landmarks): 0.0224s (44.64 fps)
 - End-to-end: 0.2438s (4.10 fps)

Timings at 720p:
 - Face locations: 0.4559s (2.19 fps)
 - Face landmarks: 0.0022s (446.57 fps)
 - Encode face (inc. landmarks): 0.0233s (42.87 fps)
 - End-to-end: 0.5005s (2.00 fps)

Timings at 1080p:
 - Face locations: 1.0234s (0.98 fps)
 - Face landmarks: 0.0022s (450.31 fps)
 - Encode face (inc. landmarks): 0.0223s (44.89 fps)
 - End-to-end: 1.0650s (0.94 fps)
yuzifu commented 6 years ago

performance on win 10 in my laptop(i7-6500U):

Benchmarks (Note: All benchmarks are only using a single CPU core)

Timings at 240p:
 - Face locations: 0.0772s (12.96 fps)
 - Face landmarks: 0.0032s (317.40 fps)
 - Encode face (inc. landmarks): 0.4384s (2.28 fps)
 - End-to-end: 0.5634s (1.77 fps)

Timings at 480p:
 - Face locations: 0.3179s (3.15 fps)
 - Face landmarks: 0.0032s (313.71 fps)
 - Encode face (inc. landmarks): 0.4389s (2.28 fps)
 - End-to-end: 0.7478s (1.34 fps)

Timings at 720p:
 - Face locations: 0.6904s (1.45 fps)
 - Face landmarks: 0.0032s (314.69 fps)
 - Encode face (inc. landmarks): 0.4395s (2.28 fps)
 - End-to-end: 1.1304s (0.88 fps)

Timings at 1080p:
 - Face locations: 1.5497s (0.65 fps)
 - Face landmarks: 0.0034s (295.68 fps)
 - Encode face (inc. landmarks): 0.4386s (2.28 fps)
 - End-to-end: 1.9922s (0.50 fps)
takuya-takeuchi commented 6 years ago

NOTE about face_detection_ex

dlib example program 'face_detection_ex'.

            cout << "processing image " << argv[i] << endl;
            array2d<unsigned char> img;
            load_image(img, argv[i]);
            // Make the image bigger by a factor of two.  This is useful since
            // the face detector looks for faces that are about 80 by 80 pixels
            // or larger.  Therefore, if you want to find faces that are smaller
            // than that then you need to upsample the image as we do here by
            // calling pyramid_up().  So this will allow it to detect faces that
            // are at least 40 by 40 pixels in size.  We could call pyramid_up()
            // again to find even smaller faces, but note that every time we
            // upsample the image we make the detector run slower since it must
            // process a larger image.
            pyramid_up(img);

            // Now tell the face detector to give us a list of bounding boxes
            // around all the faces it can find in the image.
            std::vector<rectangle> dets = detector(img);

In FaceRecognitionDotNet, detector calls with 1 as 2nd argument and not calling pyramid_up. So try to change logic and compare result

Test1

Then, it changes to

            cout << "processing image " << argv[i] << endl;
            array2d<unsigned char> img;
            load_image(img, argv[i]);
            // Make the image bigger by a factor of two.  This is useful since
            // the face detector looks for faces that are about 80 by 80 pixels
            // or larger.  Therefore, if you want to find faces that are smaller
            // than that then you need to upsample the image as we do here by
            // calling pyramid_up().  So this will allow it to detect faces that
            // are at least 40 by 40 pixels in size.  We could call pyramid_up()
            // again to find even smaller faces, but note that every time we
            // upsample the image we make the detector run slower since it must
            // process a larger image.
            //pyramid_up(img);

            // Now tell the face detector to give us a list of bounding boxes
            // around all the faces it can find in the image.
            std::vector<rectangle> dets = detector(img, 1);

And passes the following image

Test2

Then, it changes to

            cout << "processing image " << argv[i] << endl;
            array2d<unsigned char> img;
            load_image(img, argv[i]);
            // Make the image bigger by a factor of two.  This is useful since
            // the face detector looks for faces that are about 80 by 80 pixels
            // or larger.  Therefore, if you want to find faces that are smaller
            // than that then you need to upsample the image as we do here by
            // calling pyramid_up().  So this will allow it to detect faces that
            // are at least 40 by 40 pixels in size.  We could call pyramid_up()
            // again to find even smaller faces, but note that every time we
            // upsample the image we make the detector run slower since it must
            // process a larger image.
            pyramid_up(img);

            // Now tell the face detector to give us a list of bounding boxes
            // around all the faces it can find in the image.
            std::vector<rectangle> dets = detector(img, 1);

But face_recognition returns location which corresponds to original scale image. So pyramid_up should not be called.

Test3

Then, it changes to

            cout << "processing image " << argv[i] << endl;
            array2d<unsigned char> img;
            load_image(img, argv[i]);
            // Make the image bigger by a factor of two.  This is useful since
            // the face detector looks for faces that are about 80 by 80 pixels
            // or larger.  Therefore, if you want to find faces that are smaller
            // than that then you need to upsample the image as we do here by
            // calling pyramid_up().  So this will allow it to detect faces that
            // are at least 40 by 40 pixels in size.  We could call pyramid_up()
            // again to find even smaller faces, but note that every time we
            // upsample the image we make the detector run slower since it must
            // process a larger image.
            //pyramid_up(img);

            // Now tell the face detector to give us a list of bounding boxes
            // around all the faces it can find in the image.
            std::vector<rectangle> dets = detector(img, 0);

Conclusion

Perhaps, 2nd argument of face_detector may be passed to C++ side in face_recognition. Or number_of_times_to_upsample may be 0.

def _raw_face_locations(img, number_of_times_to_upsample=1, model="hog"):
    """
    Returns an array of bounding boxes of human faces in a image
    :param img: An image (as a numpy array)
    :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
    :param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
                  deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
    :return: A list of dlib 'rect' objects of found face locations
    """
    if model == "cnn":
        return cnn_face_detector(img, number_of_times_to_upsample)
    else:
        return face_detector(img, number_of_times_to_upsample)
takuya-takeuchi commented 6 years ago

Hi yuzifu, thank you for your kindly support. I found that your Encode face performance is very slower than my benchmark. Other measurement is not bad.

The problem you faces is FaceEncodings performance. Performance issue is possible to be occurred by environmental?

yuzifu commented 6 years ago

On macOS 10.13.6 in my same laptop(i7-6500U), FaceEncoding has better performance:

Benchmarks (Note: All benchmarks are only using a single CPU core)

Timings at 240p:
 - Face locations: 0.0612s (16.33 fps)
 - Face landmarks: 0.0020s (505.57 fps)
 - Encode face (inc. landmarks): 0.0272s (36.78 fps)
 - End-to-end: 0.0883s (11.33 fps)

Timings at 480p:
 - Face locations: 0.2355s (4.25 fps)
 - Face landmarks: 0.0020s (499.46 fps)
 - Encode face (inc. landmarks): 0.0269s (37.14 fps)
 - End-to-end: 0.2687s (3.72 fps)

Timings at 720p:
 - Face locations: 0.5387s (1.86 fps)
 - Face landmarks: 0.0020s (509.91 fps)
 - Encode face (inc. landmarks): 0.0257s (38.97 fps)
 - End-to-end: 0.5650s (1.77 fps)

Timings at 1080p:
 - Face locations: 1.2311s (0.81 fps)
 - Face landmarks: 0.0020s (497.55 fps)
 - Encode face (inc. landmarks): 0.0270s (37.10 fps)
 - End-to-end: 1.2567s (0.80 fps)

masoudr sayed(https://github.com/ageitgey/face_recognition/issues/175#issue-257710508):

the performance of this tool in Windows 10 was about a quarter in comparison with Ubuntu built with the same specs.

yuzifu commented 6 years ago

On win10 platform in your computer, face_recognition's performance is better than mine, is your CPU better than mine? Do you have plans to fix FaceLocations in FaceRecognitionDotNet differently than face_locations in face_recognition?

takuya-takeuchi commented 6 years ago

My machine uses i7-8700.

Do you have plans to fix FaceLocations in FaceRecognitionDotNet differently than face_locations in face_recognition?

No. For now, FaceLocations method is not completely wrong. I guess FRDotNet may make a wrong use of detector. About this, please refer NOTE about face_detection_ex

At least, change the default value of RawFaceLocations and FaceLocations argument to 0, then it will return same result with face_recognition.

takuya-takeuchi commented 6 years ago

NOTE about performance of loss_metric in dlib

D:\Works\Lib\DLib\19.15\examples\build\MSVC14.1\64\Release>dnn_face_recognition_ex.exe 200px-President_Barack_Obama.jpg
277ms
number of people found in the image: 1

    // This call asks the DNN to convert each face image in faces into a 128D vector.
    // In this 128D vector space, images from the same person will be close to each other
    // but vectors from different people will be far apart.  So we can use these vectors to
    // identify if a pair of images are from the same person or from different people.  
    std::chrono::system_clock::time_point  start, end;
    start = std::chrono::system_clock::now();
    std::vector<matrix<float,0,1>> face_descriptors = net(faces);
    end = std::chrono::system_clock::now();
    double elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

    cout << elapsed << "ms" << endl;

Interrogation point

The performance of original dlib sample is similar with FRDotNet. It could mean the following things

takuya-takeuchi commented 6 years ago

NOTE about comile option

build for python dlib

-- pybind11 v2.2.2
-- Using CMake version: 3.11.0
-- Compiling dlib version: 19.15.0
-- SSE4 instructions can be executed by the host processor.
-- AVX instructions can be executed by the host processor.
-- Enabling AVX instructions
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: D:/Works/Lib/DLib/19.15/build/temp.win-amd64-3.6/Release

build for DlibDotNet.Native.Dnn

-- Using CMake version: 3.11.0
-- Compiling dlib version: 19.15.0
-- Enabling SSE2 instructions
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: D:/Works/OpenSource/DlibDotNet/src/DlibDotNet.Native.Dnn/build_cpu

Checked CMakeCache.txt and Intel MKL library is used for python side as BLAS.

Python dlib

DLib\19.15\build\temp.win-amd64-3.6\Release\CMakeCache.txt

//Path to a library.
BLAS_Accelerate_LIBRARY:FILEPATH=BLAS_Accelerate_LIBRARY-NOTFOUND

//Path to a library.
BLAS_acml_LIBRARY:FILEPATH=BLAS_acml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_acml_mp_LIBRARY:FILEPATH=BLAS_acml_mp_LIBRARY-NOTFOUND

//Path to a library.
BLAS_blas_LIBRARY:FILEPATH=BLAS_blas_LIBRARY-NOTFOUND

//Path to a library.
BLAS_blis_LIBRARY:FILEPATH=BLAS_blis_LIBRARY-NOTFOUND

//Path to a library.
BLAS_complib.sgimath_LIBRARY:FILEPATH=BLAS_complib.sgimath_LIBRARY-NOTFOUND

//Path to a library.
BLAS_cxml_LIBRARY:FILEPATH=BLAS_cxml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_dxml_LIBRARY:FILEPATH=BLAS_dxml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_essl_LIBRARY:FILEPATH=BLAS_essl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_f77blas_LIBRARY:FILEPATH=BLAS_f77blas_LIBRARY-NOTFOUND

//Path to a library.
BLAS_goto2_LIBRARY:FILEPATH=BLAS_goto2_LIBRARY-NOTFOUND

//Path to a library.
BLAS_libguide40_LIBRARY:FILEPATH=BLAS_libguide40_LIBRARY-NOTFOUND

//Path to a library.
BLAS_libiomp5md_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/libiomp5md.lib

//Path to a library.
BLAS_mkl_core_dll_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/mkl_core_dll.lib

//Path to a library.
BLAS_mkl_intel_c_dll_LIBRARY:FILEPATH=BLAS_mkl_intel_c_dll_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_intel_lp64_dll_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/mkl_intel_lp64_dll.lib

//Path to a library.
BLAS_mkl_intel_thread_dll_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/mkl_intel_thread_dll.lib

//Path to a library.
BLAS_openblas_LIBRARY:FILEPATH=BLAS_openblas_LIBRARY-NOTFOUND

//Path to a library.
BLAS_scsl_LIBRARY:FILEPATH=BLAS_scsl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_sgemm_LIBRARY:FILEPATH=BLAS_sgemm_LIBRARY-NOTFOUND

//Path to a library.
BLAS_sunperf_LIBRARY:FILEPATH=BLAS_sunperf_LIBRARY-NOTFOUND

//Path to a library.
BLAS_vecLib_LIBRARY:FILEPATH=BLAS_vecLib_LIBRARY-NOTFOUND

//Compile your program with AVX instructions
USE_AVX_INSTRUCTIONS:BOOL=ON

//Install pybind11 headers in Python include directory instead
// of default installation prefix
USE_PYTHON_INCLUDE_DIR:BOOL=OFF

//Compile your program with SSE2 instructions
USE_SSE2_INSTRUCTIONS:BOOL=ON

//Compile your program with SSE4 instructions
USE_SSE4_INSTRUCTIONS:BOOL=ON
yuzifu commented 6 years ago

FRDotNet will detect two locations for each face.

Env

win10 64bit .net 4.7 DlibDotNet 19.15.0.20180913 from nuget.org FaceRecognitionDotNet 1.2.3.2 from nuget.org

Code

            public void LocationsTest(string file)
            {
                var imgtarget = FaceRecognition.LoadImageFile(file);
                var lo = _FaceRecognition.FaceLocations(imgtarget);
                if (lo.Count() > 0)
                {
                    foreach (var i in lo)
                        Console.WriteLine(string.Format("{0}: {1},{2},{3},{4}", file, i.Left, i.Top, i.Right, i.Bottom));
                }
                else
                {
                    Console.WriteLine(string.Format("{0}: does not detected faces", file));
                }
            }

Output

1, default numberOfTimesToUpsample
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> Lenna_(test_image).png: does not detected faces

2, numberOfTimesToUpsample=0
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> Lenna_(test_image).png: 228,228,377,377
> Lenna_(test_image).png: 228,228,377,377

So, set numberOfTimesToUpsample to 0 can solve the problem that the face area cannot be detected, but it has a problem of detecting a repeated face locations.

UPDATE: DlibDotNet ver is 19.15.0.20180913

takuya-takeuchi commented 6 years ago

I added printf to dlib source code and I found that argument of face_location is not passed to dlib side.

template <
        typename image_scanner_type
        >
    template <
        typename image_type
        >
    void object_detector<image_scanner_type>::
    operator() (
        const image_type& img,
        std::vector<rect_detection>& final_dets,
        double adjust_threshold
    ) 
    {
        printf("L432- %d\n",adjust_threshold);
        scanner.load(img);
        std::vector<std::pair<double, rectangle> > dets;
        std::vector<rect_detection> dets_accum;
        for (unsigned long i = 0; i < w.size(); ++i)
        {
            const double thresh = w[i].w(scanner.get_num_dimensions());
            scanner.detect(w[i].get_detect_argument(), dets, thresh + adjust_threshold);
            for (unsigned long j = 0; j < dets.size(); ++j)
            {
                rect_detection temp;
                temp.detection_confidence = dets[j].first-thresh;
                temp.weight_index = i;
                temp.rect = dets[j].second;
                dets_accum.push_back(temp);
            }
        }

adjust_threshold was 0.

But

def face_locations(img, number_of_times_to_upsample=1, model="hog"):
    """
    Returns an array of bounding boxes of human faces in a image
    :param img: An image (as a numpy array)
    :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
    :param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
                  deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
    :return: A list of tuples of found face locations in css (top, right, bottom, left) order
    """
    if model == "cnn":
        return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")]
    else:
        return [_trim_css_to_bounds(_rect_to_css(face), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, model)]

def _raw_face_locations(img, number_of_times_to_upsample=1, model="hog"):
    """
    Returns an array of bounding boxes of human faces in a image
    :param img: An image (as a numpy array)
    :param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
    :param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
                  deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
    :return: A list of dlib 'rect' objects of found face locations
    """
    if model == "cnn":
        return cnn_face_detector(img, number_of_times_to_upsample)
    else:
        return face_detector(img, number_of_times_to_upsample)

number_of_times_to_upsample is meaningless. Perhaps, this function always same result even though change number_of_times_to_upsample value.

takuya-takeuchi commented 6 years ago

Happy news!!! Intel MKL library improve FaceEncoding performance and FRDotNet can achieve higher performance more than python!!

Python benchmark.py

(D:\Works\Python\Envs\face_recognition) D:\Works\Local\face_recognition\examples>python benchmark.py
Benchmarks (Note: All benchmarks are only using a single CPU core)

Timings at 240p:
 - Face locations: 0.0518s (19.32 fps)
 - Face landmarks: 0.0022s (461.01 fps)
 - Encode face (inc. landmarks): 0.0232s (43.18 fps)
 - End-to-end: 0.0802s (12.46 fps)

Timings at 480p:
 - Face locations: 0.2024s (4.94 fps)
 - Face landmarks: 0.0022s (451.43 fps)
 - Encode face (inc. landmarks): 0.0224s (44.64 fps)
 - End-to-end: 0.2438s (4.10 fps)

Timings at 720p:
 - Face locations: 0.4559s (2.19 fps)
 - Face landmarks: 0.0022s (446.57 fps)
 - Encode face (inc. landmarks): 0.0233s (42.87 fps)
 - End-to-end: 0.5005s (2.00 fps)

Timings at 1080p:
 - Face locations: 1.0234s (0.98 fps)
 - Face landmarks: 0.0022s (450.31 fps)
 - Encode face (inc. landmarks): 0.0223s (44.89 fps)
 - End-to-end: 1.0650s (0.94 fps)

FaceRecognition Benchmark (Not published)

D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark>dotnet run -c Release -- "-m=models"
D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark\Properties\launchSettings.json からの起動設定を使用中...
Benchmarks

Timings at 240p:
 - Face locations: 0.0268s (37.31 fps)
 - Face landmarks: 0.0014s (714.29 fps)
 - Encode face (inc. landmarks): 0.0210s (47.62 fps)
 - End-to-end: 0.0484s (20.66 fps)

Timings at 480p:
 - Face locations: 0.1068s (9.36 fps)
 - Face landmarks: 0.0014s (714.29 fps)
 - Encode face (inc. landmarks): 0.0202s (49.50 fps)
 - End-to-end: 0.1308s (7.65 fps)

Timings at 720p:
 - Face locations: 0.2416s (4.14 fps)
 - Face landmarks: 0.0014s (714.29 fps)
 - Encode face (inc. landmarks): 0.0206s (48.54 fps)
 - End-to-end: 0.2700s (3.70 fps)

Timings at 1080p:
 - Face locations: 0.5430s (1.84 fps)
 - Face landmarks: 0.0016s (625.00 fps)
 - Encode face (inc. landmarks): 0.0206s (48.54 fps)
 - End-to-end: 0.5774s (1.73 fps)
takuya-takeuchi commented 6 years ago

After fix #8, performance of face locations could be improved.

takuya-takeuchi commented 6 years ago

After fix #8

D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark>dotnet run -c Release "-m=models"
D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark\Properties\launchSettings.json からの起動設定を使用中...
Benchmarks

Timings at 240p:
 - Face locations: 0.0140s (71.43 fps)
 - Face landmarks: 0.0016s (625.00 fps)
 - Encode face (inc. landmarks): 0.0216s (46.30 fps)
 - End-to-end: 0.0370s (27.03 fps)

Timings at 480p:
 - Face locations: 0.0566s (17.67 fps)
 - Face landmarks: 0.0016s (625.00 fps)
 - Encode face (inc. landmarks): 0.0228s (43.86 fps)
 - End-to-end: 0.0870s (11.49 fps)

Timings at 720p:
 - Face locations: 0.1282s (7.80 fps)
 - Face landmarks: 0.0016s (625.00 fps)
 - Encode face (inc. landmarks): 0.0216s (46.30 fps)
 - End-to-end: 0.1716s (5.83 fps)

Timings at 1080p:
 - Face locations: 0.2876s (3.48 fps)
 - Face landmarks: 0.0016s (625.00 fps)
 - Encode face (inc. landmarks): 0.0214s (46.73 fps)
 - End-to-end: 0.3370s (2.97 fps)
yuzifu commented 6 years ago

:+1:

takuya-takeuchi commented 6 years ago

Thank you for helping me to the end!!

yuzifu commented 6 years ago

@takuya-takeuchi Does DlibDotNet 19.15.0.20180916 use Intel MKL? In the win10 64bit, I tested FRDotNet(1.2.3.4)'s Benchmark and face_recognition(1.2.3) 's benchmark.py, and found that there was no difference in their performance.

takuya-takeuchi commented 6 years ago

No. I don't know intel allows oss developer to built-in source code or embed to binary to distribute. If you know, please let me know.