Proposal for New Postprocessing Algorithm for Front View

YoungjaeDev commented 1 year ago

Background of the Issue: During the addition of preprocess to the system, it was observed that the inference results for the entire frame and Region of Interest (ROI) came out together. As a result, a merge algorithm was required for post-processing. The merge algorithm was borrowed from "sahi" and used as a reference (https://github.com/obss/sahi/blob/26acb891168a6338f2b9c89b2f002ded968d6cb5/sahi/postprocess/combine.py#L181-L301).

However, it was found that the post-processing algorithm is only suitable for drone view images with a similar composition to the images taken by UAV. In the drone view, there are not many overlapping boxes, so the post-processing algorithm is effective. However, in the front view images such as front-facing camera view, there are more overlapping boxes than expected. As a result, the Intersection over Smaller (IOS) metric of two different boxes is frequently below 0.5, which leads to merging of boxes that should not be merged. Consequently, the mAP, the metric of object detection, came out lower than the system without preprocessing.
Suggested part The algorithm to be proposed first has the same texture as the preprocess, but it was judged to be more suitable for the front view, and a lab_test was performed, and mAP increased (0.64 -> 0.67) than the existing tiling method (based SAHI) in our validation set. This is not just the Maritime scene I deal with, but the car-like front view will benefit from tiling (preprocessing) while reducing merge issues in post-processing.

Suggested Changes:
- The new algorithm is based on the assumption that the Full Inference part is already trustworthy, and the core procedure involves dividing the algorithm into two methods, GreedyNMMBaseOrder and GreedyNMMnoMergeOrder.
- Both methods sort the frames, with the Full Frame part at the front of the order. The key difference is that the improved algorithm, GreedyNMMnoMergeOrder, searches for non-full frame boxes first and deletes them if they overlap with the full frame boxes according to the Intersection over Smaller (IOS) standard.
- Implementation: The code will be sent via Discord, and the change is only in the processImpl part. In DeepStream, the frames are divided by ROI index, and the code contains a syntax like if (nbatch[element] != 3), where 3 is the index for Full Frame.

Dear @rjhowell44 ,

I hope this message finds you well. I would like to kindly request your assistance regarding an algorithm implementation that we have in our project.

The algorithm is already implemented, but it works as a service in DSL and is intertwined with various codes. Therefore, I am kindly requesting your help in reviewing the implementation and making any necessary changes.

Thank you for your time and attention to this matter.

YoungjaeDev commented 1 year ago

@rjhowell44 Is it okay for me to start working first?

YoungjaeDev commented 1 year ago

@rjhowell44

hello. This is pph in the form of pgie_src implemented for the algorithm I proposed above. First of all, I couldn't interpipe, and first of all I focused on the implementation below . So, I proceeded in the form of inserting two input_sources, and distinguished Full and ROI inferences through batch_id = 0,1

However, At last nvds_remove_obj_meta_from_frame and nvds_add_obj_meta_to_frame do not seem to work. Can you confirm the use of the api?

uint osd_sink_pad_buffer_probe(void* buffer, void* client_data)
{
    GstBuffer* pGstBuffer = (GstBuffer*)buffer;

    NvDsBatchMeta* pBatchMeta = gst_buffer_get_nvds_batch_meta(pGstBuffer);

    int number_of_class = category_name.size();
    /**
     * @brief FrameObjectData is a variable to distinguish between Full Frame and slice inference.
     * batch_id 0 is Full Frame, 1 is Slice
     */
    std::vector<std::vector<std::vector<float>>> FrameObjectData;
    std::vector<std::vector<NvDsObjectMeta*>> ObjectMetaArray;

    FrameObjectData.resize(number_of_class);
    ObjectMetaArray.resize(number_of_class);

    std::vector<ObjectData> Output;

    // 
    NvDsFrameMeta* pBatchZeroFrameMeta = NULL;
    NvDsFrameMeta* pBatchOneFrameMeta = NULL;

    // For each frame in the batched meta data
    for (NvDsMetaList* pFrameMetaList = pBatchMeta->frame_meta_list;
         pFrameMetaList; pFrameMetaList = pFrameMetaList->next)
    {
        // Check for valid frame data
        NvDsFrameMeta* pFrameMeta = (NvDsFrameMeta*)(pFrameMetaList->data);
        if (pFrameMeta != nullptr)
        {
            auto frame_number = pFrameMeta->frame_num;
            auto batch_id = pFrameMeta->batch_id;

            // std::cout << "batch_id: " << batch_id << "\n";

            if (batch_id == 0) {
                pBatchZeroFrameMeta = pFrameMeta;
            }
            else {
                pBatchOneFrameMeta = pFrameMeta;
            }

            NvDsMetaList* pObjectMetaList = pFrameMeta->obj_meta_list;

            // For each detected object in the frame.
            while (pObjectMetaList)
            {
                // Check for valid object data
                NvDsObjectMeta* pObjectMeta = (NvDsObjectMeta*)(pObjectMetaList->data);

                std::vector<float> object = {
                    pObjectMeta->rect_params.left,
                    pObjectMeta->rect_params.top,
                    pObjectMeta->rect_params.left + pObjectMeta->rect_params.width,
                    pObjectMeta->rect_params.top + pObjectMeta->rect_params.height,
                    pObjectMeta->confidence,
                    static_cast<float>(batch_id)
                };

                FrameObjectData[pObjectMeta->class_id].emplace_back(object);
                ObjectMetaArray[pObjectMeta->class_id].emplace_back(pObjectMeta);
                pObjectMetaList = pObjectMetaList->next;
            }
        }          
    }

    // nomerge_process(FrameObjectData, 0.5f, 4, pBatchZeroFrameMeta, &ObjectMetaArray);

    const float ios_threshold = 0.5f;
    const int class_num = 4;

    assert(pBatchZeroFrameMeta != NULL);

    for (int cls_id = 0; cls_id < class_num; cls_id++)
    {
        if (FrameObjectData[cls_id].size() == 0)
            continue;

        // Convert the input 2D vector to an NdArray
        nc::NdArray<float> nd_predictions{FrameObjectData[cls_id]};

        // Extract the bounding box coordinates and assert that they are all non-negative
        auto x1 = nd_predictions(nd_predictions.rSlice(), 0);
        auto y1 = nd_predictions(nd_predictions.rSlice(), 1);
        auto x2 = nd_predictions(nd_predictions.rSlice(), 2);
        auto y2 = nd_predictions(nd_predictions.rSlice(), 3);
        assert(nc::all(x1 >= 0.0f).item() == 1);
        assert(nc::all(y1 >= 0.0f).item() == 1);
        assert(nc::all(x2 >= 0.0f).item() == 1);
        assert(nc::all(y2 >= 0.0f).item() == 1);

        // Extract the confidence scores and assert that they are all between 0 and 1
        // Column 4 refers to confidence_score
        auto scores = nd_predictions(nd_predictions.rSlice(), 4);
        assert(nc::all(scores >= 0.0f && scores <= 1.0f).item() == 1);
        auto scores_to_vec = scores.toStlVector();

        // Extract the batch numbers and assert that they are all non-negative integers
        // Column 4 refers to batch_id
        auto nbatch = nd_predictions(nd_predictions.rSlice(), 5);
        assert(nc::all(nbatch >= 0.0f && nbatch <= 1.0f).item() == 1);
        auto nbatch_to_vec = nbatch.toStlVector();

        // Initialize a vector of empty vectors, where each element will store the indices of boxes to be merged together
        std::vector<std::vector<int>> keep_to_merge_list(
            nd_predictions.shape().rows);

        // Concatenate the confidence scores and batch numbers, and argsort the resulting vector by confidence
        std::vector<std::pair<float, float>> concat =
            concat_vectors(scores_to_vec, nbatch_to_vec);

        std::vector<uint32_t> order;        
        order = _argsort(concat);

        // convert std::vector<int> to nc::NdArray
        nc::NdArray<nc::uint32> nd_order{order};

        std::vector<uint32_t> add, remove;

        // Calculate the areas of each box and assert that they are all non-negative
        auto areas = (x2 - x1) * (y2 - y1);
        assert(nc::all(areas >= 0.0f).item() == 1);

        // Perform the GreedyNMM algorithm
        while (nc::shape(nd_order).size() > 0)
        {
            // The last one has the highest score
            auto idx = nd_order[-1];
            nd_order = nd_order(0, nc::Slice(0, -1));

            // If there is only one box left, it becomes the merge target
            if (nc::shape(nd_order).size() == 0)
            {
                keep_to_merge_list[idx].emplace_back(idx);
                break;
            }

            // Find the intersection boxes
            auto xx1 = nc::maximum(x1[idx] * nc::ones<float>(nd_order.shape()),
                                   x1[nd_order]);
            auto yy1 = nc::maximum(y1[idx] * nc::ones<float>(nd_order.shape()),
                                   y1[nd_order]);
            auto xx2 = nc::minimum(x2[idx] * nc::ones<float>(nd_order.shape()),
                                   x2[nd_order]);
            auto yy2 = nc::minimum(y2[idx] * nc::ones<float>(nd_order.shape()),
                                   y2[nd_order]);

            // Find height and width of the intersection boxes
            auto w = nc::clip(xx2 - xx1, 0.0f, float(1e9));
            auto h = nc::clip(yy2 - yy1, 0.0f, float(1e9));

            // Make sure the width and height are non-negative
            assert(nc::all(w >= 0.0f && h >= 0.0f).item() == 1);

            // Find the intersection area
            auto intersection = w * h;

            // Find the areas of BBoxes according the indices in order
            auto rem_areas =
                nc::minimum(areas[idx] * nc::ones<float>(nd_order.shape()),
                            areas[nd_order]);

            // Calculate the intersection over smaller (ios) value
            auto ios = (intersection / rem_areas);

            // Process the intersection over smaller values and merge boxes

            // If the current idx is not an index for the Full Frame,
            if (nbatch[idx] != 0)
            {
                // Create a mask for IOS scores less than the threshold
                auto ios_mask = ios < ios_threshold;

                // Iterate forwards over the elements of the nd_order ndarray
                // to find the last slice element
                int cut_idx = 0;
                for (auto it = nd_order.begin(); it != nd_order.end(); ++it)
                {
                    // Access the current element using the reverse iterator
                    nc::uint32 element = *it;

                    // Check if the current element is equal to 0(Full Frame)
                    if (nbatch[element] != 0)
                    {
                        break;
                    }

                    ++cut_idx;

                    // else
                    // {
                    //     ios_mask[nd_order.rend() - rit - 1] = 1;
                    // }
                }

                // Extract the IOS mask for the Full Frame and slices up to the last slice
                auto full_and_slice_ios_mask = ios_mask(ios_mask.rSlice(), nc::Slice(0, cut_idx));

                // Check if the size of full_and_slice_ios_mask is greater than 0
                if (nc::size(full_and_slice_ios_mask) > 0)
                {
                    // Find the row indices where the masked IOU mask is 0
                    // If it overlaps, it's 1.
                    auto not_full_and_slice_ios_mask = nc::logical_not(full_and_slice_ios_mask);

                    // Check if there is at least one 1 in the not_ios_mask ndarray
                    bool has_ones = nc::any(not_full_and_slice_ios_mask).item();

                    if (has_ones)
                    {
                        // Find the column indices where the masked IOS mask is True
                        auto [_, colIndices] = nc::nonzero(not_full_and_slice_ios_mask);

                        if (colIndices.shape().cols > 0)
                        {
                            // Convert the column indices to a vector
                            auto col2Vec = colIndices.toStlVector();

                            // For each column index, add the corresponding `idx` to the `keep_to_merge_list`
                            for (auto &full_idx : col2Vec) {
                                keep_to_merge_list[nd_order[full_idx]].emplace_back(idx);
                            }
                        }

                        assert(FrameObjectData[cls_id][idx][5] == 1);
                        remove.emplace_back(idx);

                        // There is at least one row with an IOS value of 1
                        // return;
                    }
                }
                else {
                    // All rows have a non-zero IOS value
                    keep_to_merge_list[idx].emplace_back(idx);
                    nd_order = nd_order[ios_mask];

                    assert(FrameObjectData[cls_id][idx][5] == 1);
                    add.emplace_back(idx);

                    // 
                    auto rm_idx = 0;
                    for (auto it = ios_mask.begin(); it != ios_mask.end(); ++it, ++rm_idx)
                    {
                        // It(0) means overlap by ios_value
                        if (*it == 0)
                        {
                            assert(FrameObjectData[cls_id][rm_idx][5] == 1);
                            remove.emplace_back(rm_idx);
                        }
                    }
                }
            }
            else
            {
                // Add the input index to the list of indices to be merged
                keep_to_merge_list[idx].emplace_back(idx);
            }

        }

        // Merge overlapping predicted boxes using the GreedyNMM algorithm
        for (auto it = keep_to_merge_list.begin();
             it != keep_to_merge_list.end();
             ++it)
        {
            // Check if the cluster has any boxes to merge
            if ((*it).size() > 0)
            {
                // Merge the boxes in the cluster using calculateBoxUnion
                for (auto &merge_ind : *it)
                {
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()] =
                        calculateBoxUnion(
                            FrameObjectData[cls_id][it - keep_to_merge_list.begin()],
                            FrameObjectData[cls_id][merge_ind]);
                }

                ObjectMetaArray[cls_id][it - keep_to_merge_list.begin()]->rect_params.left = 
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()][0];
                ObjectMetaArray[cls_id][it - keep_to_merge_list.begin()]->rect_params.top = 
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()][1];
                ObjectMetaArray[cls_id][it - keep_to_merge_list.begin()]->rect_params.width = 
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()][2] - 
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()][0];
                ObjectMetaArray[cls_id][it - keep_to_merge_list.begin()]->rect_params.height = 
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()][3] - 
                    FrameObjectData[cls_id][it - keep_to_merge_list.begin()][1]; 
            }
        }

        for (auto &x: remove) 
        {
            nvds_remove_obj_meta_from_frame(pBatchOneFrameMeta, ObjectMetaArray[cls_id][x]);
        }

        for (auto &x: add) {
            // assert(FrameObjectData[cls_id][x][5] == -1);
            nvds_add_obj_meta_to_frame(pBatchZeroFrameMeta, ObjectMetaArray[cls_id][x], NULL);
        }
    }

    return DSL_PAD_PROBE_OK;
}

rjhowell44 commented 1 year ago

@youngjae-avikus when you say you couldn't interpipe... what does that mean???

YoungjaeDev commented 1 year ago

I was talking about this part

// Create a list of Pipeline Components to add to the new Pipeline.
const wchar_t* components[] = {L"input-source", 
                                       L"input-source-2",
                                       L"preprocessor",
                                       L"primary-gie",
                                       L"tiler",
                                       L"on-screen-display",
                                       L"window-sink",
                                       NULL};

YoungjaeDev commented 1 year ago

hello @rjhowell44 I just uploaded two files in example/cpp to the issue-938-interpipe-preproc branch. First, before testing the new_pph, I put a preproc in two Interpipe-srcs, and the Roi is drawn, but it seems that the inference is not inferred only in the Roi corresponding to it at all, but for the entire frame. Can you please check on this issue?

prominenceai / deepstream-services-library

Proposal for New Postprocessing Algorithm for Front View #938