pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.69k stars 21.33k forks source link

Getting different results in multiple runs on the same image #2535

Open massimiliano96 opened 2 years ago

massimiliano96 commented 2 years ago

Hi All!

I'm doing object detection with Yolov3 trained on custom data, the problem is that if I Run my network on the same image multiple times I get slightly different result in bounding boxes which differ by a few pixels. Here's my code :

    cv::Mat frame = cv::imread(<PATH_TO_AN_IMAGE>)
    cv::Mat blob;
    net_input_scale_ = 1.0 / 255;
    net_input_size_ = cv::Size(416, 416);
        cv::dnn::blobFromImage(frame, blob, net_input_scale_, net_input_size_, cv::Scalar(), true, false);
        cv::dnn::Net net;
        // sets the input to the network
        net.setInput(blob);

        std::vector<cv::String> net_out_names_ = net.getUnconnectedOutLayersNames();

        // runs the forward pass to get output of the output layers
        std::vector<cv::Mat> outs;
        net.forward(outs, net_out_names_);

        std::vector<int> class_ids;
        std::vector<float> confidences;
        std::vector<cv::Rect2d> rects;

        for (auto &mat : outs)
        {
            float *data = (float*)mat.data;
            for (int i = 0; i < mat.rows; ++i, data += mat.cols)
            {
                cv::Mat scores = mat.row(i).colRange(5, mat.cols);
                cv::Point class_id_point;
                double confidence;
                cv::minMaxLoc(scores, 0, &confidence, 0, &class_id_point);
                if (confidence > 0.3)
                {
                    int centerX = (int)(data[0] * frame.cols);
                    int centerY = (int)(data[1] * frame.rows);
                    int width = (int)(data[2] * frame.cols);
                    int height = (int)(data[3] * frame.rows);
                    int left = centerX - width / 2;
                    int top = centerY - height / 2;

                    class_ids.push_back(class_id_point.x);
                    confidences.push_back((float)confidence);
                    rects.push_back(cv::Rect2d(left, top, width, height));
                }
            }
        }

        std::vector<cv::Rect2d> rect_out;
        std::vector<unsigned> idx_out;
        std::vector<int> indices;
        nms_threshold = confidence_threshold_marker;
        cv::dnn::NMSBoxes(rects, confidences, 0.3, 0.3, indices);
        for(int idx : indices)
        {
            rect_out.push_back(rects[idx]);
            idx_out.push_back(class_ids[idx]);
        }

Is this normal? How can I manage this?

malfonsoNeoris commented 2 years ago

hi!.. im having a similar issue. using python, opencv and yolov4 tiny, trained with a custom dataset. processing a stream.. a truck is parked... and i get different boxes for each frame .. and the most undesired efect a range of confidences that varies between .7 to .9... and again.. the truck is parket.. not movement in the image... some minor "stream" artifacts