How to adjust this code for hand detection?

Vibhu04 commented 2 years ago

Could someone please guide me as to what changes I should make to the files of this repository to get the hand detection solution running?

pntt3011 commented 2 years ago

Hello @Vibhu04 , have you tried #3 yet?

Vibhu04 commented 2 years ago

Yes, I had a look at #3. The problem is that @AdrianPeniak 's repository that he had forked from yours to get the hand detection solution running doesn't exist anymore, so I couldn't follow the solution that you have provided there. I'll be really grateful if you could list the changes that should be made to your repository to make it run the hand detection solution instead.

pntt3011 commented 2 years ago

Oh, I didn't know about that, sorry. So the detailed steps are (you should change the class + variable name for convention):

Download palm_detection_without_custom_layer.tflite from here.
Download anchors.csv from here.
Move those 2 files to models folder.
In FaceDetection.cpp, replace face_detection_short.tflite with palm_detection_without_custom_layer.tflite.

In DetectionPostProcess.hpp:

...
#define DETECTION_SIZE  192
#define NUM_BOXES       2944
#define NUM_COORD       18
...

You are free to delete NUM_SIZE and struct AnchorOptions now.

In DetectionPostProcess.cpp:


#include "DetectionPostProcess.hpp"
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>

cv::Rect2f convertAnchorVectorToRect(const std::vector &v) { float cx = v[0]; float cy = v[1]; float w = v[2]; float h = v[3]; return cv::Rect2f(cx - w / 2, cy - h / 2, w, h); }

std::vector generateAnchors() { std::vector anchors; std::ifstream file("./models/anchors.csv"); if (file.is_open()) { std::string line; while (std::getline(file, line)) { std::istringstream ss(line); std::string token; std::vector anchor; while (std::getline(ss, token, ',')) { anchor.push_back(std::stof(token)); } anchors.push_back(convertAnchorVectorToRect(anchor)); } file.close(); } return anchors; }

my::DetectionPostProcess::DetectionPostProcess() : m_anchors(generateAnchors()) {}

// The rest remains unchanged ...

7. Finally, in `demo.cpp`:
```cpp
...
int main(int argc, char* argv[]) {

    my::FaceDetection irisLandmarker("./models");
    cv::VideoCapture cap(0);
...
        irisLandmarker.loadImageToInput(rframe);
        irisLandmarker.runInference();

        cv::rectangle(rframe, irisLandmarker.getFaceRoi(), cv::Scalar(0, 255, 0));

        #if SHOW_FPS

Note:

This is PALM detection so it only detects the lower part of the hand.
NUM_COORD = 18 means 4 for palm position + 14 for 7 key points' position. (in DetectionPostProcess::decodeBox, I only use the first four to get the palm position, you can modify it to get the key points).
Only one hand is detected at a time (you can modify DetectionPostProcess::getHighestScoreDetection to return multiple detections).
The results are jitter (as mentioned in other issues).
I haven't tried the 21 3D key points model yet.

Vibhu04 commented 2 years ago

@pntt3011 thanks a lot! Your prompt reply was very helpful.

As you had mentioned, currently the palm detection model, although jittery, returns the bounding box coordinates along with the 2D coordinates of 7 palm key points. Would you have an idea about the subsequent changes I would have to make to the repository to now incorporate the hand landmark model, in order to obtain the 3D coordinates of the 21 hand key points? Any ideas/suggestions would be very helpful.

Thanks again!

Vibhu04 commented 2 years ago

Hey @pntt3011, did you get a chance to look into how the hand landmark model can be incorporated? Any ideas/insights would be very valuable.

pntt3011 commented 2 years ago

Hi @Vibhu04, I'm sorry for my late reply. I don't have much free time on weekdays. I will try looking into it this weekend.

pntt3011 commented 2 years ago

Hello @Vibhu04, I find it rather simple to incorporate the hand landmark model.

Download hand_landmark_full.tflife or hand_landmark_lite.tflite from this link (move it to the models folder of course) (Edit 30/10/2022: Mediapipe removed the .tflite model in the master)

In FaceLandmark.cpp:

#define FACE_LANDMARKS 21
...
my::FaceLandmark::FaceLandmark(std::string modelPath):
FaceDetection(modelPath),
m_landmarkModel(modelPath + std::string("/hand_landmark_lite.tflite")) // Just change the model path
{}
...

In demo.cpp:

...
int main(int argc, char* argv[]) {

my::FaceLandmark irisLandmarker("./models"); // Change the object's class
    ...
    irisLandmarker.loadImageToInput(rframe);
    irisLandmarker.runInference();

    // For visualization
    // You can use cv::line to draw the connections between the landmarks instead
    for (auto landmark: irisLandmarker.getAllFaceLandmarks()) {
        cv::circle(rframe, landmark, 2, cv::Scalar(0, 255, 0), -1);
    }

    #if SHOW_FPS
    ...

Note:

As from this repo, to improve the results:
Calculate the hand direction by using landmark[0] and landmark[2] from Palm detection.
Rotate the ROI so that the hand direction points straight up (in my::FaceLandmark::runInference())
When get the results, project the landmarks to the original image (in my::FaceLandmark::getFaceLandmarkAt)
However it seems rather complicated, you can try if my solution is not enough for your use case.

Vibhu04 commented 2 years ago

Thank you so much @pntt3011! Really appreciate the help.

pntt3011 / mediapipe_face_iris_cpp

How to adjust this code for hand detection? #13