takuya-takeuchi / DlibDotNet

Dlib .NET wrapper written in C++ and C# for Windows, MacOS, Linux and iOS
MIT License
490 stars 135 forks source link

Face Recognition Train #187

Open danijerez opened 4 years ago

danijerez commented 4 years ago

Hi!,

first of all thank you for this great source. I've been trying to do trained facial recognition with a photo for a few days, but I can't find specific documentation.

I wanted to save landmarkDetection, and then compare it to another image and know whether to recognize the face

private static void CreateNewDataset(string filename)
        {
            List<string> imgOption = new List<string>();
            imgOption.Add(@"faces\Dani.png");

            MakeEmptyFile(filename);
            var parentDir = Path.GetDirectoryName(Path.GetFullPath(filename));

            var depth = 0;

            using (var meta = new Dataset())
            {
                var images = meta.Images;
                for (var i = 0; i < imgOption.Count; ++i)
                {
                    var arg = imgOption[i];

                    try
                    {
                        if (!File.Exists(arg))
                            throw new FileNotFoundException();

                        var temp = StripPath(arg, parentDir);
                        images.Add(new DlibDotNet.ImageDatasetMetadata.Image(temp));
                    }
                    catch (FileNotFoundException)
                    {
                        // then parser[i] should be a directory
                        const string ext = "(.png|.PNG|.jpeg|.JPEG|.jpg|.JPG|.bmp|.BMP|.dng|.DNG|.gif|.GIF)$";
                        var files = GetFilesMostDeep(arg, ext, depth).ToList();
                        files.Sort();

                        foreach (var t in files)
                            images.Add(new DlibDotNet.ImageDatasetMetadata.Image(StripPath(t, parentDir)));
                    }

                    Dlib.ImageDatasetMetadata.SaveImageDatasetMetadata(meta, filename);
                }
            }
        }

How can I loadand compare the training?

I also tried it like this

OpenFileDialog open = new OpenFileDialog();
            open.Filter = "Image Files(*.jpg; *.jpeg; *.gif; *.bmp)|*.jpg; *.jpeg; *.gif; *.bmp";
            if (open.ShowDialog() == DialogResult.OK)
            {
                var img = Dlib.LoadImage<RgbPixel>(open.FileName);
                Array<Array2D<RgbPixel>> imagesTrain = new Array<Array2D<RgbPixel>>();
                List<List<FullObjectDetection>> a = new List<List<FullObjectDetection>>();
                List<FullObjectDetection> shapes = landmarkDetection(img);
                a.Add(shapes);
                imagesTrain.PushBack(img);
                using (var trainer = new ShapePredictorTrainer())
                {
                    trainer.OverSamplingAmount = 300;
                    trainer.Nu = 0.05d;
                    trainer.TreeDepth = 2;
                    trainer.NumThreads = 2;
                    trainer.BeVerbose();

                    using (var sp = trainer.Train(imagesTrain, a))
                    {
                        ShapePredictor.Serialize(sp, @"faces\train.dat");
                    }
                }
            }

Do you have an example, even if it is very simple? In python if I got it to work

takuya-takeuchi commented 4 years ago

@danijerez Do you want to do train shape predictor, right? You can refer this example. https://github.com/takuya-takeuchi/DlibDotNet/blob/master/examples/TrainShapePredictor/Program.cs

You should be able to achieve what you want to do by using LoadImageDataset.

SaveImageDatasetMetadata save meta data of training data. Meta data is bounding boxes and reference to the image file name.

Therefore, you must copy image files to directory which has meta data *.xml.

danijerez commented 4 years ago

thanks for answering! but then how do I use the training? It is where I am most lost, I don't know what to do with the xml file that is generated.

takuya-takeuchi commented 4 years ago

@danijerez

Just in case, let me confirm. Did you create xml file? And does it have face landmark information?

This steps indicates how to train face detection. But you can do same step for train face landmark detection. Of course, you must add face land mark information to xml file. You can refer https://github.com/takuya-takeuchi/FaceRecognitionDotNet/blob/master/tools/HelenTraining/Program.cs

  1. Prepare training data

image

  1. Write training code
var facesDirectory = args[0];
Array<Array2D<byte>> imagesTrain;
IList<IList<FullObjectDetection>> faces_train;

// Load image and bounding box data from xml data which you created
Dlib.LoadImageDataset("training.xml, out imagesTrain, out faces_train);

using (var trainer = new ShapePredictorTrainer())
{
    // Train configutation. You must check dlib doctument rather than DlibDotNet
    trainer.OverSamplingAmount = 300;
    trainer.Nu = 0.05d;
    trainer.TreeDepth = 2;
    trainer.NumThreads = 2;
    trainer.BeVerbose();

    using (var sp = trainer.Train(imagesTrain, faces_train))
    {
        // TestShapePredictor can measure accuracy of training result
        Console.WriteLine($"mean training error: {Dlib.TestShapePredictor(sp, imagesTrain, faces_train, GetInterocularDistances(faces_train))}");

        // create model file data
        ShapePredictor.Serialize(sp, "sp.dat");
    }
}
danijerez commented 4 years ago

My intention was to replicate this python code, and detect from the camera image, but I don't know how to use the DataSet that I have trained and evaluate the face to know if it is the one that I have trained. Sorry for the questions.

import face_recognition
import cv2
import numpy as np
import os

known_face_encodings = []
known_face_names= []
path = 'images/'

for root, dirs, files in os.walk(path):
    for filename in files:
        image = face_recognition.load_image_file(path+filename)
        face_encoding = face_recognition.face_encodings(image)[0]
        known_face_encodings.insert(1, face_encoding)
        known_face_names.insert(1, filename.split('.')[0])

# Get a reference to webcam #0 (the default one)
video_capture = cv2.VideoCapture(0)

# Initialize some variables
face_locations = []
face_encodings = []
face_names = []
process_this_frame = True

while True:
    # Grab a single frame of video
    ret, frame = video_capture.read()

    # Resize frame of video to 1/4 size for faster face recognition processing
    small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)

    # Convert the image from BGR color (which OpenCV uses) to RGB color (which face_recognition uses)
    rgb_small_frame = small_frame[:, :, ::-1]

    # Only process every other frame of video to save time
    if process_this_frame:
        # Find all the faces and face encodings in the current frame of video
        face_locations = face_recognition.face_locations(rgb_small_frame)
        face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)

        face_names = []
        for face_encoding in face_encodings:
            # See if the face is a match for the known face(s)
            matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
            name = "Unknown"

            # # If a match was found in known_face_encodings, just use the first one.
            # if True in matches:
            #     first_match_index = matches.index(True)
            #     name = known_face_names[first_match_index]

            # Or instead, use the known face with the smallest distance to the new face
            face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
            best_match_index = np.argmin(face_distances)
            if matches[best_match_index]:
                name = known_face_names[best_match_index]

            face_names.append(name)

    process_this_frame = not process_this_frame

    # Display the results
    for (top, right, bottom, left), name in zip(face_locations, face_names):
        # Scale back up face locations since the frame we detected in was scaled to 1/4 size
        top *= 4
        right *= 4
        bottom *= 4
        left *= 4

        # Draw a box around the face
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

        # Draw a label with a name below the face
        cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)
        font = cv2.FONT_HERSHEY_DUPLEX
        cv2.putText(frame, name, (left + 6, bottom - 6), font, 1.0, (255, 255, 255), 1)

    # Display the resulting image
    cv2.imshow('Video', frame)

    # Hit 'q' on the keyboard to quit!
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release handle to the webcam
video_capture.release()
cv2.destroyAllWindows()
input()
takuya-takeuchi commented 4 years ago

@takuya-takeuchi

I'm very sorry to trouble you due to my poor English.

Let me get this straight.

  1. Did you generate xml file? You saied use the DataSet that I have trained and I understand you had already model file, right?
  2. Do you want to do face recognition? You said I wanted to save landmarkDetection and I misunderstood what you want to do.
danijerez commented 4 years ago

Oh! nice. Don't worry, my English is not very good. And can Array2D be cast to FaceRecognitionDoNet.Image? IEnumerable<Location> locationsA = fr.FaceLocations(img); CompareFace use Dlib? Thank you very much for the help

takuya-takeuchi commented 4 years ago

@danijerez

And can Array2D be cast to FaceRecognitionDoNet.Image?

No. You must create Image object from file or image raw data.

I think you can use LoadImageFile to achieve your purpose.

takuya-takeuchi commented 4 years ago

CompareFace use Dlib

Yes. FaceRecognitionDotNet is porting face_recognition by C#.

danijerez commented 4 years ago

I've made it work like this Image imageA = FaceRecognition.LoadImage(img.ToBytes(), img.Rows, img.Columns, 3); Although there is a slight delay in the calculation in each frame that recognizes a face, how could I improve it?

takuya-takeuchi commented 4 years ago

@danijerez How slow is it? I have no data about LoadImage and ToBytes performance.

LoadImage convert byte[] data to dlib matrix object. dlib matrix does not require stride but input data may have it. so DlibDotNet lib convert each byte[] pixel value to dest matrix data pixel value. So it could be slow. And ToBytes have same issue.

https://github.com/takuya-takeuchi/DlibDotNet/blob/439e07e908ac9c1ccd7abe32d5f0ffbe4ba5bacf/src/DlibDotNet.Native/dlib/matrix/matrix.h#L31

https://github.com/takuya-takeuchi/DlibDotNet/blob/439e07e908ac9c1ccd7abe32d5f0ffbe4ba5bacf/src/DlibDotNet.Native/dlib/extensions/extensions.h#L27

We have to modify the above code to improve but I can not say for sure.

danijerez commented 4 years ago

I have small doubts.

shape_predictor_5_face_landmarks.dat

capture.Read(image);
            var array = new byte[image.Width * image.Height * image.ElemSize()];
            Marshal.Copy(image.Data, array, 0, array.Length);
takuya-takeuchi commented 4 years ago

Why does the face compare (FaceRecognition.CompareFace) algorithm use 5 landmark?

To be exact, face landmark is not used for face comparing. It is used for generating face feature data (1x128 float matrix). Landmark does not get to do with face recognition. Face feature data is generated from face image data.

Because when transforming an opencv (Mat) frame into Array2D, colors are lost (the image looks more bluish)?

DlibDotNet requires RGB color rather BGR color. OpenCV and Windows Bitmap uses BGR. So We must swap R and B before processing face detection and recognition. You can check this document about performance between RGB and BGR. https://github.com/takuya-takeuchi/FaceRecognitionDotNet/tree/develop/examples/RgbBgr

Does DlibDotNet or FaceRecognitionDotNet have a system to capture the webcam?

No. DlibDotNet does not use OpenCV and any camera control component. You must use OpenCVSharp, EmguCV or 3rd party component for C#.

danijerez commented 4 years ago

Maybe that's a dumb question, but is there any existing function to switch from Bgr to Rgb and vicebersa?

takuya-takeuchi commented 4 years ago

@danijerez No, Current FRDN does not provide. yeah, I probably should provide this function. You looks like using OpenCVSharp. So you can use Cv2.cvtColor and use Mat.Data to get IntPtr.

danijerez commented 4 years ago

by if it's helpful

      public static void RGBtoBGR(Bitmap bmp)
        {
            BitmapData data = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height),
                                           ImageLockMode.ReadWrite, bmp.PixelFormat);

            int length = Math.Abs(data.Stride) * bmp.Height;

            unsafe
            {
                byte* rgbValues = (byte*)data.Scan0.ToPointer();

                for (int i = 0; i < length; i += 3)
                {
                    byte dummy = rgbValues[i];
                    rgbValues[i] = rgbValues[i + 2];
                    rgbValues[i + 2] = dummy;
                }
            }

            bmp.UnlockBits(data);
        }
takuya-takeuchi commented 4 years ago

FYI https://github.com/takuya-takeuchi/DlibDotNet/issues/188