pntt3011 / mediapipe_face_iris_cpp

Real-time Face and Iris Landmarks Detection using C++
GNU General Public License v3.0
81 stars 15 forks source link

How easy to adopt your code for object detection. #11

Open ababo opened 2 years ago

ababo commented 2 years ago

I would like to reuse/generalize your code with ssdlite_object_detection.tflite. How would NUM_SIZES and AnchorOptions look like in this case (assuming that DETECTION_SIZE = 320, NUM_BOXES = 2034, NUM_COORD = 4)?

pntt3011 commented 2 years ago

Hello, My generateAnchors is simplified given a face always has ratio 1:1. This is mediapipe's version Or you can refer to this fork because he already implemented it in my code. (Note: you need add some lines of code to his code)

const float scale =
          CalculateScale(options.min_scale(), options.max_scale(),
                         last_same_stride_layer, options.strides_size());
      /* Add this block 
      if (last_same_stride_layer == 0 &&
          options.reduce_boxes_in_lowest_layer()) {
        // For first layer, it can be specified to use predefined anchors.
        aspect_ratios.push_back(1.0);
        aspect_ratios.push_back(2.0);
        aspect_ratios.push_back(0.5);
        scales.push_back(0.1);
        scales.push_back(scale);
        scales.push_back(scale);
      } else {
      */
        for (int aspect_ratio_id = 0;
             aspect_ratio_id < options.aspect_ratios_size();
             ++aspect_ratio_id) {

Mediapipe object detection uses this config.

ababo commented 2 years ago

Thank you @pntt3011 . Do you know what values will contain sizes and numLayers in case of object detection? How can I deduce those from the config above?

ababo commented 2 years ago

Understood, will try to follow the fork you suggested, thanks.

pntt3011 commented 2 years ago

The way I use those values is not correct (it's still sufficient in Face Detection case) so it cannot be applied generally. You can replace the generateAnchors with the following codes:

struct AnchorsParams
{
    int input_size_width;
    int input_size_height;

    float min_scale;
    float max_scale;

    float anchor_offset_x;
    float anchor_offset_y;

    int num_layers;
    std::vector<int> feature_map_width;
    std::vector<int> feature_map_height;
    std::vector<int> strides;
    std::vector<float> aspect_ratios;

    bool reduce_boxes_in_lowest_layer;
};

static float calculateScale(float min_scale, float max_scale, int stride_index, int num_strides)
{
    if (num_strides == 1)
    {
        return (min_scale + max_scale) * 0.5f;
    }
    // else
    return min_scale + (max_scale - min_scale) * 1.0 * stride_index / (num_strides - 1.0f);
}

std::vector<cv::Rect2f> generateAnchors(const AnchorsParams &anchor_params)
{
    std::vector<cv::Rect2f> anchors;

    for (int layer_id = 0; layer_id < anchor_params.strides.size();)
    {
        std::vector<float> anchor_height;
        std::vector<float> anchor_width;
        std::vector<float> aspect_ratios;
        std::vector<float> scales;

        int last_same_stride_layer = layer_id;
        while (last_same_stride_layer < (int)anchor_params.strides.size() && anchor_params.strides[last_same_stride_layer] == anchor_params.strides[layer_id])
        {
            const float scale = calculateScale(anchor_params.min_scale, anchor_params.max_scale, last_same_stride_layer, anchor_params.strides.size());
            if (last_same_stride_layer == 0 && anchor_params.reduce_boxes_in_lowest_layer)
            {
                // For first layer, it can be specified to use predefined anchors.
                aspect_ratios.push_back(1.0);
                aspect_ratios.push_back(2.0);
                aspect_ratios.push_back(0.5);
                scales.push_back(0.1);
                scales.push_back(scale);
                scales.push_back(scale);
            }
            else
            {
                for (int aspect_ratio_id = 0; aspect_ratio_id < (int)anchor_params.aspect_ratios.size(); aspect_ratio_id++)
                {
                    aspect_ratios.push_back(anchor_params.aspect_ratios[aspect_ratio_id]);
                    scales.push_back(scale);
                }

                const float scale_next = last_same_stride_layer == (int)anchor_params.strides.size() - 1 ? 1.0f : calculateScale(anchor_params.min_scale, anchor_params.max_scale, last_same_stride_layer + 1, anchor_params.strides.size());
                scales.push_back(std::sqrt(scale * scale_next));
                aspect_ratios.push_back(1.0);
            }
            last_same_stride_layer++;
        }

        for (int i = 0; i < (int)aspect_ratios.size(); ++i)
        {
            const float ratio_sqrts = std::sqrt(aspect_ratios[i]);
            anchor_height.push_back(scales[i] / ratio_sqrts);
            anchor_width.push_back(scales[i] * ratio_sqrts);
        }

        int feature_map_height = 0;
        int feature_map_width = 0;
        const int stride = anchor_params.strides[layer_id];
        feature_map_height = std::ceil(1.0f * anchor_params.input_size_height / stride);
        feature_map_width = std::ceil(1.0f * anchor_params.input_size_width / stride);

        for (int y = 0; y < feature_map_height; ++y)
        {
            for (int x = 0; x < feature_map_width; ++x)
            {
                for (int anchor_id = 0; anchor_id < (int)anchor_height.size(); ++anchor_id)
                {
                    float x_center = (x + anchor_params.anchor_offset_x) * 1.f / feature_map_width;
                    float y_center = (y + anchor_params.anchor_offset_y) * 1.f / feature_map_width;
                    float w = 1.f;
                    float h = 1.f;
                    anchors.push_back(cv::Rect2f(x_center - w / 2.f, y_center - h / 2.f, w, h));
                }
            }
        }
        layer_id = last_same_stride_layer;
    }

    return anchors;
}

AnchorsParams getObjectDetectionSSDParams()
{
    AnchorsParams anchor_options;
    anchor_options.num_layers = 6;
    anchor_options.min_scale = 0.2;
    anchor_options.max_scale = 0.95;
    anchor_options.input_size_height = 320;
    anchor_options.input_size_width = 320;
    anchor_options.anchor_offset_x = 0.5f;
    anchor_options.anchor_offset_y = 0.5f;
    anchor_options.strides.push_back(16);
    anchor_options.strides.push_back(32);
    anchor_options.strides.push_back(64);
    anchor_options.strides.push_back(128);
    anchor_options.strides.push_back(256);
    anchor_options.strides.push_back(512);
    anchor_options.aspect_ratios.push_back(1.0);
    anchor_options.aspect_ratios.push_back(2.0);
    anchor_options.aspect_ratios.push_back(0.5);
    anchor_options.aspect_ratios.push_back(3.0);
    anchor_options.aspect_ratios.push_back(0.3333);
    anchor_options.reduce_boxes_in_lowest_layer = true;
}

my::DetectionPostProcess::DetectionPostProcess() : m_anchors(generateAnchors(getObjectDetectionSSDParams())) {}

In DetectionPostProcess, I only choose the bounding box with the highest score. However, in object detection, you need to implement Non-max suppression instead to detect multiple objects and remove duplicate ones.

ababo commented 2 years ago

Is this code tested? Doesn't seem to work...

pntt3011 commented 2 years ago

I'm sorry that I haven't tested it. I don't have too much free time these days and my current laptop doesn't have this project. Can you show me the error?