takuya-takeuchi / FaceRecognitionDotNet

The world's simplest facial recognition api for .NET on Windows, MacOS and Linux
MIT License
1.27k stars 308 forks source link

(CPU) Speeding up Face Recognition #61

Closed vulevnr closed 5 years ago

vulevnr commented 5 years ago

Hi takuya-takeuchi! I see document From: https://github.com/ageitgey/face_recognition Anchor "Speeding up Face Recognition" image

How to setting number CPU Core?

takuya-takeuchi commented 5 years ago

Python face_recognition --cpu means using multiprocess. You can check https://github.com/ageitgey/face_recognition/blob/a9dd28d5f97e2b5d83791548eeb9c24a807bca73/face_recognition/face_recognition_cli.py#L71

Python can call myself as one of multiprocess and gather all result from other processes. It is embeded as native functionality.

But c# does not support native multiprocess library and syntax. You must consider how to collect result, send image data and communication ptotocol if we want to use multiprocess.

So multiprocess function does not fit to library requirements.

So you should write simpke server applications and communicate with them. It could be high scalability.

vulevnr commented 5 years ago

Thank you very must! I used MKL 💃

timiil commented 4 years ago

the same question, do we have any chance to use multi thread to speed up the landmark cacluating ?

i am thinking about the code below:

 public IEnumerable<FaceEncoding> FaceEncodings(Image image, IEnumerable<Location> knownFaceLocation = null, int numJitters = 1, PredictorModel model = PredictorModel.Small)
        {
            if (image == null)
                throw new ArgumentNullException(nameof(image));
            if (model == PredictorModel.Custom)
                throw new NotSupportedException("FaceRecognitionDotNet.PredictorModel.Custom is not supported.");

            image.ThrowIfDisposed();
            this.ThrowIfDisposed();

            var rawLandmarks = this.RawFaceLandmarks(image, knownFaceLocation, model);
            foreach (var landmark in rawLandmarks)
            {
                var ret = new FaceEncoding(FaceRecognitionModelV1.ComputeFaceDescriptor(this._FaceEncoder, image, landmark, numJitters));
                landmark.Dispose();
                yield return ret;
            }
        }

===> changed to:

 public double[][] FaceEncodings(Image image, IEnumerable<Location> knownFaceLocation = null, int numJitters = 1, PredictorModel model = PredictorModel.Small)
        {
            if (image == null)
                throw new ArgumentNullException(nameof(image));
            if (model == PredictorModel.Custom)
                throw new NotSupportedException("FaceRecognitionDotNet.PredictorModel.Custom is not supported.");

            image.ThrowIfDisposed();
            this.ThrowIfDisposed();

            var rawLandmarks = this.RawFaceLandmarks(image, knownFaceLocation, model);
            List<double[]> list = new List<double[]>();
            ParallelOptions pos = new ParallelOptions() { MaxDegreeOfParallelism = this.config.LandmarkThreads };
            var r = Parallel.ForEach(rawLandmarks, pos, f =>
            {
                var fe = new FaceEncoding(FaceRecognitionModelV1.ComputeFaceDescriptor(this._FaceEncoder, image, landmark, numJitters));                      //in multi thread parallel mode, the code will hit the exception for 'read write memory failed'
               var doubles = fe.ToArray();        
                list.Add(doubles);
                f.Dispose();
            });           

            return list.ToArray(); 
        }

i think if FaceRecognitionModelV1.ComputeFaceDescriptor can support multi thread safe invoke, that we should speed up very easy.

takuya-takeuchi commented 4 years ago

@timiil RawFaceLanrmarks use neural network and it must be protected from same access. So you must create FaceRecogtnion object for each thread. But landmark is very fast and we can not obtain more benefit by multi threading.

timiil commented 4 years ago

@takuya-takeuchi i am not very agree with you that landmark is fast. i had landmark a photo with one face, which is about 300ms in my 'intel i5-8600K windows 10' desktop, another photo with 10 faces, it take about 29XX ms, you can see that the span ms number is very linear, so i am thinking : It is meaning something that we implement the multi threading landmark , sepecially on those arm mobile devices.

correct me if i wrong, thanks.

takuya-takeuchi commented 4 years ago

@timiil OK. I understand. You mean that FRDN.MKL can not improve detecting face landmark, right? Or you didn't try FRDN.MKL, right? I think GPU and Intel MKL can improve performance drastically.

Please check the detail performance log. https://github.com/takuya-takeuchi/FaceRecognitionDotNet/issues/4#issuecomment-421353167

timiil commented 4 years ago

Please kindly tell me how to enable FRDN.MKL ? i am running on one CPU intel i5 8600K, or i5 9400F, i have no idea the "MKL" is using or not. but i have noticed that the CPU seems ONLY working in A core, even a photo is 10 faces inside, that we should cacluate 10 landmark encode.

takuya-takeuchi commented 4 years ago

@timiil Download Intel MKL from intel. https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html

And you pick up binaries from it. https://github.com/takuya-takeuchi/FaceRecognitionDotNet/wiki/Quickstart#for-mkl

takuya-takeuchi commented 4 years ago

Related issue https://github.com/takuya-takeuchi/DlibDotNet/issues/192