Open ababo opened 2 years ago
Hello,
My generateAnchors
is simplified given a face always has ratio 1:1.
This is mediapipe's version
Or you can refer to this fork because he already implemented it in my code.
(Note: you need add some lines of code to his code)
const float scale =
CalculateScale(options.min_scale(), options.max_scale(),
last_same_stride_layer, options.strides_size());
/* Add this block
if (last_same_stride_layer == 0 &&
options.reduce_boxes_in_lowest_layer()) {
// For first layer, it can be specified to use predefined anchors.
aspect_ratios.push_back(1.0);
aspect_ratios.push_back(2.0);
aspect_ratios.push_back(0.5);
scales.push_back(0.1);
scales.push_back(scale);
scales.push_back(scale);
} else {
*/
for (int aspect_ratio_id = 0;
aspect_ratio_id < options.aspect_ratios_size();
++aspect_ratio_id) {
Mediapipe object detection uses this config.
Thank you @pntt3011 . Do you know what values will contain sizes
and numLayers
in case of object detection? How can I deduce those from the config above?
Understood, will try to follow the fork you suggested, thanks.
The way I use those values is not correct (it's still sufficient in Face Detection case) so it cannot be applied generally.
You can replace the generateAnchors
with the following codes:
struct AnchorsParams
{
int input_size_width;
int input_size_height;
float min_scale;
float max_scale;
float anchor_offset_x;
float anchor_offset_y;
int num_layers;
std::vector<int> feature_map_width;
std::vector<int> feature_map_height;
std::vector<int> strides;
std::vector<float> aspect_ratios;
bool reduce_boxes_in_lowest_layer;
};
static float calculateScale(float min_scale, float max_scale, int stride_index, int num_strides)
{
if (num_strides == 1)
{
return (min_scale + max_scale) * 0.5f;
}
// else
return min_scale + (max_scale - min_scale) * 1.0 * stride_index / (num_strides - 1.0f);
}
std::vector<cv::Rect2f> generateAnchors(const AnchorsParams &anchor_params)
{
std::vector<cv::Rect2f> anchors;
for (int layer_id = 0; layer_id < anchor_params.strides.size();)
{
std::vector<float> anchor_height;
std::vector<float> anchor_width;
std::vector<float> aspect_ratios;
std::vector<float> scales;
int last_same_stride_layer = layer_id;
while (last_same_stride_layer < (int)anchor_params.strides.size() && anchor_params.strides[last_same_stride_layer] == anchor_params.strides[layer_id])
{
const float scale = calculateScale(anchor_params.min_scale, anchor_params.max_scale, last_same_stride_layer, anchor_params.strides.size());
if (last_same_stride_layer == 0 && anchor_params.reduce_boxes_in_lowest_layer)
{
// For first layer, it can be specified to use predefined anchors.
aspect_ratios.push_back(1.0);
aspect_ratios.push_back(2.0);
aspect_ratios.push_back(0.5);
scales.push_back(0.1);
scales.push_back(scale);
scales.push_back(scale);
}
else
{
for (int aspect_ratio_id = 0; aspect_ratio_id < (int)anchor_params.aspect_ratios.size(); aspect_ratio_id++)
{
aspect_ratios.push_back(anchor_params.aspect_ratios[aspect_ratio_id]);
scales.push_back(scale);
}
const float scale_next = last_same_stride_layer == (int)anchor_params.strides.size() - 1 ? 1.0f : calculateScale(anchor_params.min_scale, anchor_params.max_scale, last_same_stride_layer + 1, anchor_params.strides.size());
scales.push_back(std::sqrt(scale * scale_next));
aspect_ratios.push_back(1.0);
}
last_same_stride_layer++;
}
for (int i = 0; i < (int)aspect_ratios.size(); ++i)
{
const float ratio_sqrts = std::sqrt(aspect_ratios[i]);
anchor_height.push_back(scales[i] / ratio_sqrts);
anchor_width.push_back(scales[i] * ratio_sqrts);
}
int feature_map_height = 0;
int feature_map_width = 0;
const int stride = anchor_params.strides[layer_id];
feature_map_height = std::ceil(1.0f * anchor_params.input_size_height / stride);
feature_map_width = std::ceil(1.0f * anchor_params.input_size_width / stride);
for (int y = 0; y < feature_map_height; ++y)
{
for (int x = 0; x < feature_map_width; ++x)
{
for (int anchor_id = 0; anchor_id < (int)anchor_height.size(); ++anchor_id)
{
float x_center = (x + anchor_params.anchor_offset_x) * 1.f / feature_map_width;
float y_center = (y + anchor_params.anchor_offset_y) * 1.f / feature_map_width;
float w = 1.f;
float h = 1.f;
anchors.push_back(cv::Rect2f(x_center - w / 2.f, y_center - h / 2.f, w, h));
}
}
}
layer_id = last_same_stride_layer;
}
return anchors;
}
AnchorsParams getObjectDetectionSSDParams()
{
AnchorsParams anchor_options;
anchor_options.num_layers = 6;
anchor_options.min_scale = 0.2;
anchor_options.max_scale = 0.95;
anchor_options.input_size_height = 320;
anchor_options.input_size_width = 320;
anchor_options.anchor_offset_x = 0.5f;
anchor_options.anchor_offset_y = 0.5f;
anchor_options.strides.push_back(16);
anchor_options.strides.push_back(32);
anchor_options.strides.push_back(64);
anchor_options.strides.push_back(128);
anchor_options.strides.push_back(256);
anchor_options.strides.push_back(512);
anchor_options.aspect_ratios.push_back(1.0);
anchor_options.aspect_ratios.push_back(2.0);
anchor_options.aspect_ratios.push_back(0.5);
anchor_options.aspect_ratios.push_back(3.0);
anchor_options.aspect_ratios.push_back(0.3333);
anchor_options.reduce_boxes_in_lowest_layer = true;
}
my::DetectionPostProcess::DetectionPostProcess() : m_anchors(generateAnchors(getObjectDetectionSSDParams())) {}
In DetectionPostProcess
, I only choose the bounding box with the highest score. However, in object detection, you need to implement Non-max suppression instead to detect multiple objects and remove duplicate ones.
Is this code tested? Doesn't seem to work...
I'm sorry that I haven't tested it. I don't have too much free time these days and my current laptop doesn't have this project. Can you show me the error?
I would like to reuse/generalize your code with
ssdlite_object_detection.tflite
. How wouldNUM_SIZES
andAnchorOptions
look like in this case (assuming thatDETECTION_SIZE = 320, NUM_BOXES = 2034, NUM_COORD = 4
)?