pntt3011 / mediapipe_face_iris_cpp

Real-time Face and Iris Landmarks Detection using C++
GNU General Public License v3.0
80 stars 15 forks source link

How to extend this project for selfie segmentation? #20

Closed postacik closed 1 year ago

postacik commented 1 year ago

Hi, This repo is the only repo I could find to use mediapipe in a C++ application. Thanks for sharing it.

Can you please show the right way of adding a class for selfie segmentation and create a demo similar to the python example?

postacik commented 1 year ago

I started adding a helper class for selfie segmentation:

#include "SelfieSegmentation.hpp"

my::SelfieSegmentation::SelfieSegmentation(std::string modelDir) :
    my::ModelLoader(modelDir + std::string("/selfie_segmentation.tflite")) 
{}

void my::SelfieSegmentation::loadImageToInput(const cv::Mat& in, int index) {
    ModelLoader::loadImageToInput(in);
}

void my::SelfieSegmentation::runInference() {
    ModelLoader::runInference();
}

However ModelLoader fails at allocateTensors() function with the following message:

ERROR: Encountered unresolved custom op: Convolution2DTransposeBias.
ERROR: Node number 244 (Convolution2DTransposeBias) failed to prepare.

ERROR: Failed to apply the default TensorFlow Lite delegate indexed at 0.
Failed to allocate tensors.

Have you ever encountered such an error?

pntt3011 commented 1 year ago

Hi @postacik, I'm sorry that I hadn't noticed your issue until your latest response. About the segmentation model, i am very busy recently and do not have time to look at the segmentation graph. My code is just about running the tflite model and the face detection graph is quite easy to implement. About the error you encountered, it's because you are using the standard tflite, while mediapipe uses their custom tflite with some addtional ops (same with #12)

postacik commented 1 year ago

Hi @pntt3011, thank you for your reply. I've just seen the resolver implementations in the mediapipe library and I'm trying to use them in the ModelLoader class for selfie segmentation. I'll report here if I can succeed.

postacik commented 1 year ago

I succeeded to run allocateTensors() by adding the following resolver in buildInterpreter() function:

void my::ModelLoader::buildInterpreter(int numThreads) {
    tflite::ops::builtin::BuiltinOpResolver resolver;

    resolver.AddCustom("Convolution2DTransposeBias", mediapipe::tflite_operations::RegisterConvolution2DTransposeBias());

    if (tflite::InterpreterBuilder(*m_model, resolver)(&m_interpreter) != kTfLiteOk) {
        std::cerr << "Failed to build interpreter." << std::endl;
        std::exit(1);
    }
    m_interpreter->SetNumThreads(numThreads);
}

I copied RegisterConvolution2DTransposeBias() function from mediapipe source code.

However when I run ModelLoader::loadOutput() to get the output picture as mask (see picture below), the function returns a float array.

image

image

Your ModalLoaderclass has a loadImageToInput() function but no loadImageFromOutput() function.

How can I convert this float array to a matrix the same size of the input image?

pntt3011 commented 1 year ago

@postacik I am very delightful to hear about your success. You can try resizing the output tensor to the same width and height of the input image. As far as i know, Mask RCNN also predicts the mask with fixed size then resizes it to the input size. Some papers like PointRend improves the upsampling with some conv blocks.

postacik commented 1 year ago

I think I should do the reverse of this function of your code:

cv::Mat my::ModelLoader::preprocessImage(const cv::Mat& in, int idx) const {
    auto out = convertToRGB(in);

    std::vector<int> inputShape = getInputShape(idx);
    int H = inputShape[1];
    int W = inputShape[2]; 

    cv::Size wantedSize = cv::Size(W, H);
    cv::resize(out, out, wantedSize);

    /*
    Equivalent to (out - mean)/ std
    */
    out.convertTo(out, CV_32FC3, 1 / INPUT_NORM_STD, -INPUT_NORM_MEAN / INPUT_NORM_STD);
    return out;
}

Am I on the right path?

postacik commented 1 year ago

I wrote a helper function as below and it seems to work:

std::vector<float> my::SelfieSegmentation::getSegmentationMask() const {
    return ModelLoader::loadOutput(0);
}

cv::Mat my::SelfieSegmentation::loadOutputImage(int imageHeight, int imageWidth) const
{
    auto vec = getSegmentationMask();
    std::vector<int> outputShape = getOutputShape(0);
    int H = outputShape[1];
    int W = outputShape[2];
    cv::Mat out = cv::Mat(H, W, CV_32FC1);

    if (vec.size() == H * W * sizeof(float)) // check that the rows and cols match the size of your vector
    {
        // copy vector to mat
        memcpy(out.data, vec.data(), vec.size());
    }
    cv::Size wantedSize = cv::Size(imageWidth, imageHeight);
    cv::resize(out, out, wantedSize);
    return out;
}

image

I would appreciate your valuable comments.

pntt3011 commented 1 year ago

Hi @postacik, I think the way you did is correct. Congratulations!

postacik commented 1 year ago

Thanks for your help, closing...