Closed yuzifu closed 6 years ago
It's weird. I tested lenna.jpg (512x480) and it resulted the following performance.
Mode | Total | Average |
---|---|---|
GPU | 3656 ms | 36 ms |
CPU | 6776 ms | 67 ms |
By the way, do you dispose tempsource instance? If not, it may occur memory leak. It may cause performance issue due to lack of memory.
Thank you for your reply. I try to dispose tempsource, but performance has not improved. I try to compare lenna.jpg, the code as following:
private void lennatest()
{
Mat target = Cv2.ImRead("lenna_small.jpg"); // 481x512 cut from lenna.jpg
var arrtarget = new byte[target.Width * target.Height * target.ElemSize()];
Marshal.Copy(target.Data, arrtarget, 0, arrtarget.Length);
using (var temptarget = Dlib.LoadImageData<RgbPixel>(arrtarget, (uint)target.Height, (uint)target.Width, (uint)(target.Width * target.ElemSize())))
{
var imgtarget = FaceRecognition.LoadImageData(temptarget);
var endodings = this._FaceRecognition.FaceEncodings(imgtarget).ToArray();
Mat source = Cv2.ImRead("lenna.jpg"); //512x512
int cnt = 0;
long total = 0;
long max = 0;
while (cnt++ < 100)
{
System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
watch.Start();
bool result = CompareFace(source, endodings);
watch.Stop();
total += watch.ElapsedMilliseconds;
if (watch.ElapsedMilliseconds > max)
max = watch.ElapsedMilliseconds;
}
Name.Text = max.ToString();
Second.Text = (total / 100.0).ToString();
}
}
private bool CompareFace(Mat source, FaceEncoding[] endtarget)
{
bool rtn = false;
var arrsource = new byte[source.Width * source.Height * source.ElemSize()];
Marshal.Copy(source.Data, arrsource, 0, arrsource.Length);
using (var tempsource = Dlib.LoadImageData<RgbPixel>(arrsource, (uint)source.Height, (uint)source.Width, (uint)(source.Width * source.ElemSize())))
{
using (var imgsource = FaceRecognition.LoadImageData(tempsource))
{
var locals = this._FaceRecognition.FaceLocations(imgsource);
var endodings1 = this._FaceRecognition.FaceEncodings(imgsource, locals).ToArray();
foreach (var encoding in endodings1)
foreach (var compareFace in FaceRecognition.CompareFaces(endtarget, encoding))
{
if (compareFace)
{
rtn = true;
break;
}
}
foreach (var encoding in endodings1)
encoding.Dispose();
}
}
return rtn;
}
The running time between 30-40ms, but it is not really comparing, because FaceRecognition .FaceLocations() returns length is 0.
Thank you for giving test code. Plz give time to check and reproduce.
I found that the Native.loss_metric_operator_matrixs() used a lot of time.
If you use nugget, could you tell me FaceRecognition version you use? I always test latest code. So this issue may be happened in a certain version.
Today, I can work on this issue because I have nothing to do. Therefore, I need more clues.
I am using the source code cloned from the repository on September 8, 2018, include DlibDotNet.Native.
I am trying to add FaceRecognitionDotNet and DlibDotNet from nugget, it's show error info "Failed to add reference to 'DlibDotNet.Native'" when I add DlibDotNet, Do I need to compile DlibDotNet.Native from source using CMake?
yeah, I know this issue. Sorry for trouble you.
You are welcome, thank you very much for your work.
I published new FaceRecognitionDotNet package 1.2.3.2. It works fine.
I was able to reproduce, Yes, library cannot detect face from lenna,jpg. So FaceEncodings returns 0 length array. However, it does not mean FaceRecognitionDotNet does not work.
The below image works fine. https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/President_Barack_Obama.jpg/512px-President_Barack_Obama.jpg
So I must check the original face_recognition source.
I checked face_recognition on Ubuntu 16.04.4 by using python. In conclusion, lenna image was not detected face.
It can't detect that the face is a problem, in my business logic, if it does not detect face, logic will be stop.
I used latest code to test again, the conclusion is that if the image does not detect the face, FaceEncodings() speed will be very fast; if the image detects the face, FaceEncodings() speed will be very slow.
I cannot conclude but performance issue may be occured by dlib. So I will prepare real Linux machine and measure performance of face_recognition.
I create a console demo, DlibDotNet 19.15.0.20180911 and FaceRecognitionDotNet 1.2.3.2 install from nuget.org, test code as following:
long total = 0;
long max = 0;
public void EncodingTest()
{
var imgtarget = FaceRecognition.LoadImageFile("512px-President_Barack_Obama.jpg");
int cnt = 0;
while (cnt++ < 100)
{
var lo = _FaceRecognition.FaceLocations(imgtarget);
System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
watch.Start();
var tmp = _FaceRecognition.FaceEncodings(imgtarget).ToArray();
watch.Stop();
total += watch.ElapsedMilliseconds;
if (watch.ElapsedMilliseconds > max)
max = watch.ElapsedMilliseconds;
}
double avg = total / 100.0;
Console.Write(string.Format("Max: {0}, Avg: {1}", max, avg));
Console.ReadLine();
}
test image https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/President_Barack_Obama.jpg/512px-President_Barack_Obama.jpg, it can detect two face location, the output is:
Max: 1378, Avg: 1015.57
test image https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png, it can't detect face, the output is:
Max: 166, Avg: 113.19
In macOS 10.13.6, I checked the above two images using face_recognition, the output is as follows:
FaceRecognitionDotNet detected two faces in 512px-President_BarackObama.jpg, And they are repeated. FaceRecognitionDotNet does not detect faces in Lenna(test_image).png.
test code is as following:
public void EncodingTest(string file)
{
var imgtarget = FaceRecognition.LoadImageFile(file);
int cnt = 0;
long total = 0;
long max = 0;
var lo = _FaceRecognition.FaceLocations(imgtarget);
foreach (var i in lo)
Console.WriteLine(string.Format("{0}: {1},{2},{3},{4}", file, i.Left, i.Top, i.Right, i.Bottom));
while (cnt++ < 100)
{
System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
watch.Start();
var tmp = _FaceRecognition.FaceEncodings(imgtarget).ToArray();
watch.Stop();
total += watch.ElapsedMilliseconds;
if (watch.ElapsedMilliseconds > max)
max = watch.ElapsedMilliseconds;
}
double avg = total / 100.0;
Console.WriteLine(string.Format("{0}: Max - {1}, Avg - {2}", file, max, avg));
}
The output is as following:
512px-President_Barack_Obama.jpg: 189,79,314,203 512px-President_Barack_Obama.jpg: 189,79,314,203 512px-President_BarackObama.jpg: Max - 2077, Avg - 1205.05 Lenna(test_image).png: Max - 241, Avg - 202.45
I reproduced on MacOS X by python.
$ python3 face_detection_cli.py ~/Work/tmp/face_recognition/
/Users/spitzbergen/Work/tmp/face_recognition/512px-President_Barack_Obama.jpg,79,314,203,189
/Users/spitzbergen/Work/tmp/face_recognition/Lenna.png,228,377,377,228
And Windows by python
(D:\Works\Python\Envs\face_recognition) d:\Works\Local\face_recognition\face_recognition>face_detection .
.\512px-President_Barack_Obama.jpg,79,314,203,189
.\Lenna.png,228,377,377,228
FaceRecognition or DlibDotNet have something wrong. If correct this wrong, performance may be improved?
The results of FaceLocations() different face_locations(), so maybe need to fix FaceLocations bug before improve performance.
NOTE
benchmark.py on Windows
(D:\Works\Python\Envs\face_recognition) D:\Works\Local\face_recognition\examples>python benchmark.py
Benchmarks (Note: All benchmarks are only using a single CPU core)
Timings at 240p:
- Face locations: 0.0518s (19.32 fps)
- Face landmarks: 0.0022s (461.01 fps)
- Encode face (inc. landmarks): 0.0232s (43.18 fps)
- End-to-end: 0.0802s (12.46 fps)
Timings at 480p:
- Face locations: 0.2024s (4.94 fps)
- Face landmarks: 0.0022s (451.43 fps)
- Encode face (inc. landmarks): 0.0224s (44.64 fps)
- End-to-end: 0.2438s (4.10 fps)
Timings at 720p:
- Face locations: 0.4559s (2.19 fps)
- Face landmarks: 0.0022s (446.57 fps)
- Encode face (inc. landmarks): 0.0233s (42.87 fps)
- End-to-end: 0.5005s (2.00 fps)
Timings at 1080p:
- Face locations: 1.0234s (0.98 fps)
- Face landmarks: 0.0022s (450.31 fps)
- Encode face (inc. landmarks): 0.0223s (44.89 fps)
- End-to-end: 1.0650s (0.94 fps)
performance on win 10 in my laptop(i7-6500U):
Benchmarks (Note: All benchmarks are only using a single CPU core)
Timings at 240p:
- Face locations: 0.0772s (12.96 fps)
- Face landmarks: 0.0032s (317.40 fps)
- Encode face (inc. landmarks): 0.4384s (2.28 fps)
- End-to-end: 0.5634s (1.77 fps)
Timings at 480p:
- Face locations: 0.3179s (3.15 fps)
- Face landmarks: 0.0032s (313.71 fps)
- Encode face (inc. landmarks): 0.4389s (2.28 fps)
- End-to-end: 0.7478s (1.34 fps)
Timings at 720p:
- Face locations: 0.6904s (1.45 fps)
- Face landmarks: 0.0032s (314.69 fps)
- Encode face (inc. landmarks): 0.4395s (2.28 fps)
- End-to-end: 1.1304s (0.88 fps)
Timings at 1080p:
- Face locations: 1.5497s (0.65 fps)
- Face landmarks: 0.0034s (295.68 fps)
- Encode face (inc. landmarks): 0.4386s (2.28 fps)
- End-to-end: 1.9922s (0.50 fps)
dlib example program 'face_detection_ex'.
cout << "processing image " << argv[i] << endl;
array2d<unsigned char> img;
load_image(img, argv[i]);
// Make the image bigger by a factor of two. This is useful since
// the face detector looks for faces that are about 80 by 80 pixels
// or larger. Therefore, if you want to find faces that are smaller
// than that then you need to upsample the image as we do here by
// calling pyramid_up(). So this will allow it to detect faces that
// are at least 40 by 40 pixels in size. We could call pyramid_up()
// again to find even smaller faces, but note that every time we
// upsample the image we make the detector run slower since it must
// process a larger image.
pyramid_up(img);
// Now tell the face detector to give us a list of bounding boxes
// around all the faces it can find in the image.
std::vector<rectangle> dets = detector(img);
In FaceRecognitionDotNet, detector calls with 1 as 2nd argument and not calling pyramid_up. So try to change logic and compare result
Then, it changes to
cout << "processing image " << argv[i] << endl;
array2d<unsigned char> img;
load_image(img, argv[i]);
// Make the image bigger by a factor of two. This is useful since
// the face detector looks for faces that are about 80 by 80 pixels
// or larger. Therefore, if you want to find faces that are smaller
// than that then you need to upsample the image as we do here by
// calling pyramid_up(). So this will allow it to detect faces that
// are at least 40 by 40 pixels in size. We could call pyramid_up()
// again to find even smaller faces, but note that every time we
// upsample the image we make the detector run slower since it must
// process a larger image.
//pyramid_up(img);
// Now tell the face detector to give us a list of bounding boxes
// around all the faces it can find in the image.
std::vector<rectangle> dets = detector(img, 1);
And passes the following image
Then, it changes to
cout << "processing image " << argv[i] << endl;
array2d<unsigned char> img;
load_image(img, argv[i]);
// Make the image bigger by a factor of two. This is useful since
// the face detector looks for faces that are about 80 by 80 pixels
// or larger. Therefore, if you want to find faces that are smaller
// than that then you need to upsample the image as we do here by
// calling pyramid_up(). So this will allow it to detect faces that
// are at least 40 by 40 pixels in size. We could call pyramid_up()
// again to find even smaller faces, but note that every time we
// upsample the image we make the detector run slower since it must
// process a larger image.
pyramid_up(img);
// Now tell the face detector to give us a list of bounding boxes
// around all the faces it can find in the image.
std::vector<rectangle> dets = detector(img, 1);
But face_recognition returns location which corresponds to original scale image. So pyramid_up should not be called.
Then, it changes to
cout << "processing image " << argv[i] << endl;
array2d<unsigned char> img;
load_image(img, argv[i]);
// Make the image bigger by a factor of two. This is useful since
// the face detector looks for faces that are about 80 by 80 pixels
// or larger. Therefore, if you want to find faces that are smaller
// than that then you need to upsample the image as we do here by
// calling pyramid_up(). So this will allow it to detect faces that
// are at least 40 by 40 pixels in size. We could call pyramid_up()
// again to find even smaller faces, but note that every time we
// upsample the image we make the detector run slower since it must
// process a larger image.
//pyramid_up(img);
// Now tell the face detector to give us a list of bounding boxes
// around all the faces it can find in the image.
std::vector<rectangle> dets = detector(img, 0);
Perhaps, 2nd argument of face_detector may be passed to C++ side in face_recognition. Or number_of_times_to_upsample may be 0.
def _raw_face_locations(img, number_of_times_to_upsample=1, model="hog"):
"""
Returns an array of bounding boxes of human faces in a image
:param img: An image (as a numpy array)
:param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
:param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
:return: A list of dlib 'rect' objects of found face locations
"""
if model == "cnn":
return cnn_face_detector(img, number_of_times_to_upsample)
else:
return face_detector(img, number_of_times_to_upsample)
Hi yuzifu, thank you for your kindly support. I found that your Encode face performance is very slower than my benchmark. Other measurement is not bad.
The problem you faces is FaceEncodings performance. Performance issue is possible to be occurred by environmental?
On macOS 10.13.6 in my same laptop(i7-6500U), FaceEncoding has better performance:
Benchmarks (Note: All benchmarks are only using a single CPU core)
Timings at 240p:
- Face locations: 0.0612s (16.33 fps)
- Face landmarks: 0.0020s (505.57 fps)
- Encode face (inc. landmarks): 0.0272s (36.78 fps)
- End-to-end: 0.0883s (11.33 fps)
Timings at 480p:
- Face locations: 0.2355s (4.25 fps)
- Face landmarks: 0.0020s (499.46 fps)
- Encode face (inc. landmarks): 0.0269s (37.14 fps)
- End-to-end: 0.2687s (3.72 fps)
Timings at 720p:
- Face locations: 0.5387s (1.86 fps)
- Face landmarks: 0.0020s (509.91 fps)
- Encode face (inc. landmarks): 0.0257s (38.97 fps)
- End-to-end: 0.5650s (1.77 fps)
Timings at 1080p:
- Face locations: 1.2311s (0.81 fps)
- Face landmarks: 0.0020s (497.55 fps)
- Encode face (inc. landmarks): 0.0270s (37.10 fps)
- End-to-end: 1.2567s (0.80 fps)
masoudr sayed(https://github.com/ageitgey/face_recognition/issues/175#issue-257710508):
the performance of this tool in Windows 10 was about a quarter in comparison with Ubuntu built with the same specs.
On win10 platform in your computer, face_recognition's performance is better than mine, is your CPU better than mine? Do you have plans to fix FaceLocations in FaceRecognitionDotNet differently than face_locations in face_recognition?
My machine uses i7-8700.
Do you have plans to fix FaceLocations in FaceRecognitionDotNet differently than face_locations in face_recognition?
No. For now, FaceLocations method is not completely wrong. I guess FRDotNet may make a wrong use of detector. About this, please refer NOTE about face_detection_ex
At least, change the default value of RawFaceLocations and FaceLocations argument to 0, then it will return same result with face_recognition.
D:\Works\Lib\DLib\19.15\examples\build\MSVC14.1\64\Release>dnn_face_recognition_ex.exe 200px-President_Barack_Obama.jpg
277ms
number of people found in the image: 1
// This call asks the DNN to convert each face image in faces into a 128D vector.
// In this 128D vector space, images from the same person will be close to each other
// but vectors from different people will be far apart. So we can use these vectors to
// identify if a pair of images are from the same person or from different people.
std::chrono::system_clock::time_point start, end;
start = std::chrono::system_clock::now();
std::vector<matrix<float,0,1>> face_descriptors = net(faces);
end = std::chrono::system_clock::now();
double elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
cout << elapsed << "ms" << endl;
The performance of original dlib sample is similar with FRDotNet. It could mean the following things
-- pybind11 v2.2.2
-- Using CMake version: 3.11.0
-- Compiling dlib version: 19.15.0
-- SSE4 instructions can be executed by the host processor.
-- AVX instructions can be executed by the host processor.
-- Enabling AVX instructions
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: D:/Works/Lib/DLib/19.15/build/temp.win-amd64-3.6/Release
-- Using CMake version: 3.11.0
-- Compiling dlib version: 19.15.0
-- Enabling SSE2 instructions
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: D:/Works/OpenSource/DlibDotNet/src/DlibDotNet.Native.Dnn/build_cpu
Checked CMakeCache.txt and Intel MKL library is used for python side as BLAS.
DLib\19.15\build\temp.win-amd64-3.6\Release\CMakeCache.txt
//Path to a library.
BLAS_Accelerate_LIBRARY:FILEPATH=BLAS_Accelerate_LIBRARY-NOTFOUND
//Path to a library.
BLAS_acml_LIBRARY:FILEPATH=BLAS_acml_LIBRARY-NOTFOUND
//Path to a library.
BLAS_acml_mp_LIBRARY:FILEPATH=BLAS_acml_mp_LIBRARY-NOTFOUND
//Path to a library.
BLAS_blas_LIBRARY:FILEPATH=BLAS_blas_LIBRARY-NOTFOUND
//Path to a library.
BLAS_blis_LIBRARY:FILEPATH=BLAS_blis_LIBRARY-NOTFOUND
//Path to a library.
BLAS_complib.sgimath_LIBRARY:FILEPATH=BLAS_complib.sgimath_LIBRARY-NOTFOUND
//Path to a library.
BLAS_cxml_LIBRARY:FILEPATH=BLAS_cxml_LIBRARY-NOTFOUND
//Path to a library.
BLAS_dxml_LIBRARY:FILEPATH=BLAS_dxml_LIBRARY-NOTFOUND
//Path to a library.
BLAS_essl_LIBRARY:FILEPATH=BLAS_essl_LIBRARY-NOTFOUND
//Path to a library.
BLAS_f77blas_LIBRARY:FILEPATH=BLAS_f77blas_LIBRARY-NOTFOUND
//Path to a library.
BLAS_goto2_LIBRARY:FILEPATH=BLAS_goto2_LIBRARY-NOTFOUND
//Path to a library.
BLAS_libguide40_LIBRARY:FILEPATH=BLAS_libguide40_LIBRARY-NOTFOUND
//Path to a library.
BLAS_libiomp5md_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/libiomp5md.lib
//Path to a library.
BLAS_mkl_core_dll_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/mkl_core_dll.lib
//Path to a library.
BLAS_mkl_intel_c_dll_LIBRARY:FILEPATH=BLAS_mkl_intel_c_dll_LIBRARY-NOTFOUND
//Path to a library.
BLAS_mkl_intel_lp64_dll_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/mkl_intel_lp64_dll.lib
//Path to a library.
BLAS_mkl_intel_thread_dll_LIBRARY:FILEPATH=C:/Program Files (x86)/Microsoft Visual Studio/Shared/Anaconda3_64/Library/lib/mkl_intel_thread_dll.lib
//Path to a library.
BLAS_openblas_LIBRARY:FILEPATH=BLAS_openblas_LIBRARY-NOTFOUND
//Path to a library.
BLAS_scsl_LIBRARY:FILEPATH=BLAS_scsl_LIBRARY-NOTFOUND
//Path to a library.
BLAS_sgemm_LIBRARY:FILEPATH=BLAS_sgemm_LIBRARY-NOTFOUND
//Path to a library.
BLAS_sunperf_LIBRARY:FILEPATH=BLAS_sunperf_LIBRARY-NOTFOUND
//Path to a library.
BLAS_vecLib_LIBRARY:FILEPATH=BLAS_vecLib_LIBRARY-NOTFOUND
//Compile your program with AVX instructions
USE_AVX_INSTRUCTIONS:BOOL=ON
//Install pybind11 headers in Python include directory instead
// of default installation prefix
USE_PYTHON_INCLUDE_DIR:BOOL=OFF
//Compile your program with SSE2 instructions
USE_SSE2_INSTRUCTIONS:BOOL=ON
//Compile your program with SSE4 instructions
USE_SSE4_INSTRUCTIONS:BOOL=ON
FRDotNet will detect two locations for each face.
win10 64bit .net 4.7 DlibDotNet 19.15.0.20180913 from nuget.org FaceRecognitionDotNet 1.2.3.2 from nuget.org
public void LocationsTest(string file)
{
var imgtarget = FaceRecognition.LoadImageFile(file);
var lo = _FaceRecognition.FaceLocations(imgtarget);
if (lo.Count() > 0)
{
foreach (var i in lo)
Console.WriteLine(string.Format("{0}: {1},{2},{3},{4}", file, i.Left, i.Top, i.Right, i.Bottom));
}
else
{
Console.WriteLine(string.Format("{0}: does not detected faces", file));
}
}
1, default numberOfTimesToUpsample
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> Lenna_(test_image).png: does not detected faces
2, numberOfTimesToUpsample=0
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> 512px-President_Barack_Obama.jpg: 189,79,314,203
> Lenna_(test_image).png: 228,228,377,377
> Lenna_(test_image).png: 228,228,377,377
So, set numberOfTimesToUpsample to 0 can solve the problem that the face area cannot be detected, but it has a problem of detecting a repeated face locations.
UPDATE: DlibDotNet ver is 19.15.0.20180913
I added printf to dlib source code and I found that argument of face_location is not passed to dlib side.
template <
typename image_scanner_type
>
template <
typename image_type
>
void object_detector<image_scanner_type>::
operator() (
const image_type& img,
std::vector<rect_detection>& final_dets,
double adjust_threshold
)
{
printf("L432- %d\n",adjust_threshold);
scanner.load(img);
std::vector<std::pair<double, rectangle> > dets;
std::vector<rect_detection> dets_accum;
for (unsigned long i = 0; i < w.size(); ++i)
{
const double thresh = w[i].w(scanner.get_num_dimensions());
scanner.detect(w[i].get_detect_argument(), dets, thresh + adjust_threshold);
for (unsigned long j = 0; j < dets.size(); ++j)
{
rect_detection temp;
temp.detection_confidence = dets[j].first-thresh;
temp.weight_index = i;
temp.rect = dets[j].second;
dets_accum.push_back(temp);
}
}
adjust_threshold was 0.
But
def face_locations(img, number_of_times_to_upsample=1, model="hog"):
"""
Returns an array of bounding boxes of human faces in a image
:param img: An image (as a numpy array)
:param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
:param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
:return: A list of tuples of found face locations in css (top, right, bottom, left) order
"""
if model == "cnn":
return [_trim_css_to_bounds(_rect_to_css(face.rect), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, "cnn")]
else:
return [_trim_css_to_bounds(_rect_to_css(face), img.shape) for face in _raw_face_locations(img, number_of_times_to_upsample, model)]
def _raw_face_locations(img, number_of_times_to_upsample=1, model="hog"):
"""
Returns an array of bounding boxes of human faces in a image
:param img: An image (as a numpy array)
:param number_of_times_to_upsample: How many times to upsample the image looking for faces. Higher numbers find smaller faces.
:param model: Which face detection model to use. "hog" is less accurate but faster on CPUs. "cnn" is a more accurate
deep-learning model which is GPU/CUDA accelerated (if available). The default is "hog".
:return: A list of dlib 'rect' objects of found face locations
"""
if model == "cnn":
return cnn_face_detector(img, number_of_times_to_upsample)
else:
return face_detector(img, number_of_times_to_upsample)
number_of_times_to_upsample is meaningless. Perhaps, this function always same result even though change number_of_times_to_upsample value.
Happy news!!! Intel MKL library improve FaceEncoding performance and FRDotNet can achieve higher performance more than python!!
(D:\Works\Python\Envs\face_recognition) D:\Works\Local\face_recognition\examples>python benchmark.py
Benchmarks (Note: All benchmarks are only using a single CPU core)
Timings at 240p:
- Face locations: 0.0518s (19.32 fps)
- Face landmarks: 0.0022s (461.01 fps)
- Encode face (inc. landmarks): 0.0232s (43.18 fps)
- End-to-end: 0.0802s (12.46 fps)
Timings at 480p:
- Face locations: 0.2024s (4.94 fps)
- Face landmarks: 0.0022s (451.43 fps)
- Encode face (inc. landmarks): 0.0224s (44.64 fps)
- End-to-end: 0.2438s (4.10 fps)
Timings at 720p:
- Face locations: 0.4559s (2.19 fps)
- Face landmarks: 0.0022s (446.57 fps)
- Encode face (inc. landmarks): 0.0233s (42.87 fps)
- End-to-end: 0.5005s (2.00 fps)
Timings at 1080p:
- Face locations: 1.0234s (0.98 fps)
- Face landmarks: 0.0022s (450.31 fps)
- Encode face (inc. landmarks): 0.0223s (44.89 fps)
- End-to-end: 1.0650s (0.94 fps)
D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark>dotnet run -c Release -- "-m=models"
D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark\Properties\launchSettings.json からの起動設定を使用中...
Benchmarks
Timings at 240p:
- Face locations: 0.0268s (37.31 fps)
- Face landmarks: 0.0014s (714.29 fps)
- Encode face (inc. landmarks): 0.0210s (47.62 fps)
- End-to-end: 0.0484s (20.66 fps)
Timings at 480p:
- Face locations: 0.1068s (9.36 fps)
- Face landmarks: 0.0014s (714.29 fps)
- Encode face (inc. landmarks): 0.0202s (49.50 fps)
- End-to-end: 0.1308s (7.65 fps)
Timings at 720p:
- Face locations: 0.2416s (4.14 fps)
- Face landmarks: 0.0014s (714.29 fps)
- Encode face (inc. landmarks): 0.0206s (48.54 fps)
- End-to-end: 0.2700s (3.70 fps)
Timings at 1080p:
- Face locations: 0.5430s (1.84 fps)
- Face landmarks: 0.0016s (625.00 fps)
- Encode face (inc. landmarks): 0.0206s (48.54 fps)
- End-to-end: 0.5774s (1.73 fps)
After fix #8, performance of face locations could be improved.
After fix #8
D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark>dotnet run -c Release "-m=models"
D:\Works\OpenSource\FaceRecognitionDotNet\examples\Benchmark\Properties\launchSettings.json からの起動設定を使用中...
Benchmarks
Timings at 240p:
- Face locations: 0.0140s (71.43 fps)
- Face landmarks: 0.0016s (625.00 fps)
- Encode face (inc. landmarks): 0.0216s (46.30 fps)
- End-to-end: 0.0370s (27.03 fps)
Timings at 480p:
- Face locations: 0.0566s (17.67 fps)
- Face landmarks: 0.0016s (625.00 fps)
- Encode face (inc. landmarks): 0.0228s (43.86 fps)
- End-to-end: 0.0870s (11.49 fps)
Timings at 720p:
- Face locations: 0.1282s (7.80 fps)
- Face landmarks: 0.0016s (625.00 fps)
- Encode face (inc. landmarks): 0.0216s (46.30 fps)
- End-to-end: 0.1716s (5.83 fps)
Timings at 1080p:
- Face locations: 0.2876s (3.48 fps)
- Face landmarks: 0.0016s (625.00 fps)
- Encode face (inc. landmarks): 0.0214s (46.73 fps)
- End-to-end: 0.3370s (2.97 fps)
:+1:
Thank you for helping me to the end!!
@takuya-takeuchi Does DlibDotNet 19.15.0.20180916 use Intel MKL? In the win10 64bit, I tested FRDotNet(1.2.3.4)'s Benchmark and face_recognition(1.2.3) 's benchmark.py, and found that there was no difference in their performance.
No. I don't know intel allows oss developer to built-in source code or embed to binary to distribute. If you know, please let me know.
I create a wpf demo for face compare, the code is as following:
By tracking the running time, execute FaceEncodings() need a lot of time, at least 600 milliseconds, many times more than 1 second.