ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
51.17k stars 16.43k forks source link

Is there any way I can use yolov5 with opencv dnn #239

Closed chaUAV closed 4 years ago

chaUAV commented 4 years ago

🚀 Feature

Is there any way I can use yolov5 with opencv dnn

edurenye commented 4 years ago

Yes @chaUAV it is possible, you need to export it using https://github.com/ultralytics/yolov5/blob/master/models/export.py, inside the file there is an usage example, then the model will be exported as an ONNX model and it can be imported in OpenCV using cv2.dnn.readNetFromONNX(model_path)

Or at least this is the supposed way, I found this issue doing that: #250

glenn-jocher commented 4 years ago

@chaUAV @edurenye I've added a pinned documentation issue now at the top of https://github.com/ultralytics/yolov5/issues for this, hopefully this will help everyone to understand the basic functionality.

The INT64's remain one mystery among many in the export process though.

edurenye commented 4 years ago

Thanks @glenn-jocher I think it's the labels but I need to test, and I'm having some problems with Docker and me trying to update the nvidia drivers to 450, so it might take me a while.

chaUAV commented 4 years ago

Thanks you guys, I had already export to onnx and use it with opencv but I got the same error as #250 is there anythings i could do to fix it? @glenn-jocher

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

MohamedAliRashad commented 4 years ago

This issue needs to be reopened @glenn-jocher we need an alternative way to the onnx method.

glenn-jocher commented 4 years ago

@MohamedAliRashad sure. Are you trying to export an official YOLOv5 model for use with opencv? I can provide versions of these in ONNX format with outputs structured correctly, but they will lack NMS functionality. Is there a way to append an NMS module in ONNX?

MohamedAliRashad commented 4 years ago

@glenn-jocher I was thinking about readNetFromDarkNet like the previous versions of YOLO

glenn-jocher commented 4 years ago

@MohamedAliRashad sorry I've just never used opencv dnn. Can you provide demo code for how this would work ideally? As I said I can provide fully functional exports in all supported formats for COCO and VOC trained YOLOv5 models. What format do you need it in exactly, and how is NMS handled?

MohamedAliRashad commented 4 years ago

@glenn-jocher it's quite simple actually. First, you read the model weights and configuration to construct the network net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

then, we infer an input

blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
detections = net.forward(ln)

And finally we run thresholds for filtration with a code like this

boxes = []
confidences = []
classIDs = []
for output in detections:
    for detection in output:
        scores = detection[5:]
        classID = np.argmax(scores)
        confidence = scores[classID]
        if confidence > args["confidence"]:
            # W, H are the dimensions of the input image
            box = detection[0:4] * np.array([W, H, W, H])
            (centerX, centerY, width, height) = box.astype("int")
            x = int(centerX - (width / 2))
            y = int(centerY - (height / 2))
            boxes.append([x, y, int(width), int(height)])
            confidences.append(float(confidence))
            classIDs.append(classID)
idxs = cv2.dnn.NMSBoxes(boxes, confidences, confidence, threshold)
sean-wade commented 3 years ago

Has anyone done it properly? Using opencv-dnn to inference yolov5 models ... ? Is there any guide ?

leeyunhome commented 3 years ago

@MohamedAliRashad sure. Are you trying to export an official YOLOv5 model for use with opencv? I can provide versions of these in ONNX format with outputs structured correctly, but they will lack NMS functionality. Is there a way to append an NMS module in ONNX?

Hello?

Can I take this onnx model and test it out?

Thank you.

leeyunhome commented 3 years ago

@chaUAV @edurenye I've added a pinned documentation issue now at the top of https://github.com/ultralytics/yolov5/issues for this, hopefully this will help everyone to understand the basic functionality.

The INT64's remain one mystery among many in the export process though.

Hello?

The savings seem to have disappeared from the top over time. Could you tell me the url of the document?

Thank you.

glenn-jocher commented 3 years ago

@leeyunhome I've exported a YOLOv5s.onnx model at 640x640 here. It has two outputs, boxes (25200,4), and classes (25200,80). https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx

Screen Shot 2021-02-20 at 1 57 27 PM
leeyunhome commented 3 years ago

@leeyunhome I've exported a YOLOv5s.onnx model at 640x640 here. It has two outputs, boxes (25200,4), and classes (25200,80). https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx

Screen Shot 2021-02-20 at 1 57 27 PM

Thank you for answer. I have an additional question.

  1. Where did you change from the original repo to produce this output?

  2. I would like to know the contents of the torch.onnx.export function used for this conversion.

  3. You should be able to interpret the contents of the output tensor. I don't know how the two outputs, boxes (25200,4), and classes (25200,80) are boxes and classes. Can you tell me what I need to study in this regard? I guess 80 in classes(25200, 80) is the number of classes I have in a file like coco.names, but I don't know about 25200.

Thank you

glenn-jocher commented 3 years ago

@leeyunhome this is an optimized ONNX model that we create using a private repo (ultralytics/yolov5-export). It's part of our paid product offerings. It works well for fixed output shapes, i.e. if you want an ONNX model to view 720p webcam streams.

25200 is the number of output points from a 640x640 image. You pass these through NMS to get your detections.

vishal-nasre commented 3 years ago

Does anyone have a prepared notebook on yolov5 with OpenCV for live stream..? I am about the drop the plan to use yolov5 :) due to this.

glenn-jocher commented 3 years ago

@vishal-nasre YOLOv5 runs inference out of the box on a variety of sources including remote streams (RTSP, HTTP etc.) and local webcams. See https://github.com/ultralytics/yolov5#quick-start-examples for details.

Screenshot 2021-07-31 at 22 29 09
glenn-jocher commented 3 years ago

@edurenye @chaUAV @MohamedAliRashad @a954217436 @leeyunhome good news 😃! Your original issue may now be fixed ✅ in PR #4833 by @SamFC10. This PR implements architecture updates to allow for ONNX-exported YOLOv5 models to be used with OpenCV DNN.

To receive this update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

snehitvaddi commented 3 years ago

Good to hear this update @glenn-jocher. Can you brief out steps involved one last time like How to export the ONNX model, Is there any additional changes we need to do of Open-CV compatibility.

glenn-jocher commented 3 years ago

@edurenye @chaUAV @MohamedAliRashad @a954217436 @leeyunhome steps for OpenCV DNN inference:

# Export to ONNX
python export.py --weights yolov5s.pt --include onnx --simplify

# Inference
python detect.py --weights yolov5s.onnx  # ONNX Runtime inference
# -- or --
python detect.py --weights yolov5s.onnx --dnn  # OpenCV DNN inference
snehitvaddi commented 3 years ago

Has anyone implemented inference through webcam & OpenCV using exported onnx model ?🤔

I knew python detect.py --weights yolov5s.onnx --dnn is for inference but I'm trying to implement something in real-time from webcam. It would be really helpful, if anyone can share the OpenCV-webcam implementation of exported ONNX model.

glenn-jocher commented 3 years ago

@snehitvaddi read the README

Screenshot 2021-10-22 at 15 52 19
PauloMendes33 commented 2 years ago

As it appears presently, OpenCV dnn does not support layers from Yolov5. Only in the future it will be supported, but when?

glenn-jocher commented 2 years ago

@PauloMendes33 YOLOv5 inference with DNN is super easy:

python export.py --weights yolov5s.pt --include onnx
python detect.py --weights yolov5s.onnx --dnn
alexanderaltena commented 2 years ago

When I run the proposed python scripts I'm getting the following error:

Traceback (most recent call last): File "detect.py", line 243, in main(opt) File "detect.py", line 238, in main run(*vars(opt)) File "/Users/vascowerk/.local/share/virtualenvs/visaTorch-ajivcrNC/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(args, **kwargs) File "detect.py", line 79, in run model = DetectMultiBackend(weights, device=device, dnn=dnn) File "/Users/vascowerk/visaTorch/yolov5/models/common.py", line 316, in init net = cv2.dnn.readNetFromONNX(w) cv2.error: OpenCV(4.5.3-openvino) ../opencv/modules/dnn/src/onnx/onnx_importer.cpp:2129: error: (-2:Unspecified error) in function 'handleNode'

Node [Unsqueeze]:(339) parse error: OpenCV(4.5.3-openvino) ../opencv/modules/dnn/src/onnx/onnx_importer.cpp:1551: error: (-215:Assertion failed) node_proto.input_size() == 1 in function 'handleNode'

Im new to object detection so I'ts probably a stupid mistake on my end, but I have no clue how to resolve this

During the creating of the onnx model I'm also getting the following warning so I'm not sure if this is related:

ONNX: starting export with onnx 1.10.2... /Users/vascowerk/visaTorch/yolov5/models/yolo.py:57: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]: WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. ONNX: export success, saved as yolov5s.onnx (29.3 MB) ONNX: run --dynamic ONNX model inference with: 'python detect.py --weights yolov5s.onnx'

glenn-jocher commented 2 years ago

@alexanderaltena DNN requires opencv > 4.5.4. Everything works correctly.

Screen Shot 2021-12-02 at 4 40 49 PM
alexanderaltena commented 2 years ago

Man I'd kiss you if I could thnx!!

PauloMendes33 commented 2 years ago

@alexanderaltena What are the news about it? Do you were able to make inferences using openCV 4.5.4?

Presently I am installing opencv 4.5.4 to test the software, after I will tell something.

snehitvaddi commented 2 years ago

@alexanderaltena, Are you inferencing through command-line or using exported ONNX model in python code?

PauloMendes33 commented 2 years ago

I am able to read the .onnx model (using cv::dnn::readNetFrom ONNX("yolov5.onnx")) however I am unable to make predictions in the image. The image output is a downscaled image without any prediction, this is, using the code provided in my last comment.

PauloMendes33 commented 2 years ago

The code is as it follows.

`/*

/*

include

include

include

include <opencv4/opencv2/opencv.hpp>

using namespace std;

// YOLO

include

include <opencv4/opencv2/dnn.hpp>

include <opencv4/opencv2/dnn/all_layers.hpp>

constexpr float CONFIDENCE_THRESHOLD = 0; constexpr float NMS_THRESHOLD = 0.4; //number of classes to detect //constexpr int NUM_CLASSES = 80; constexpr int NUM_CLASSES = 5;// to detect only one class -> the first in the coco_names_txt file list ?!?? // colors for bounding boxes const cv::Scalar colors[] = { {0, 255, 255}, {255, 255, 0}, {0, 255, 0}, {255, 0, 0} }; const auto NUM_COLORS = sizeof(colors)/sizeof(colors[0]); // /*

`

doleron commented 2 years ago

@MohamedAliRashad sorry I've just never used opencv dnn. Can you provide demo code for how this would work ideally? As I said I can provide fully functional exports in all supported formats for COCO and VOC trained YOLOv5 models. What format do you need it in exactly, and how is NMS handled?

@glenn-jocher @PauloMendes33 I use this code to run YOLO V5 with OpenCV DNN:

import cv2
import time
import sys
import numpy as np

def build_model(is_cuda):
    net = cv2.dnn.readNet("config_files/yolov5s.onnx")
    if is_cuda:
        print("Attempty to use CUDA")
        net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
        net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)
    else:
        print("Running on CPU")
        net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
        net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
    return net

INPUT_WIDTH = 640
INPUT_HEIGHT = 640
SCORE_THRESHOLD = 0.2
NMS_THRESHOLD = 0.4
CONFIDENCE_THRESHOLD = 0.4

def detect(image, net):
    blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
    net.setInput(blob)
    preds = net.forward()
    return preds

def load_capture():
    capture = cv2.VideoCapture("sample.mp4")
    return capture

def load_classes():
    class_list = []
    with open("config_files/classes.txt", "r") as f:
        class_list = [cname.strip() for cname in f.readlines()]
    return class_list

class_list = load_classes()

def wrap_detection(input_image, output_data):
    class_ids = []
    confidences = []
    boxes = []

    rows = output_data.shape[0]

    image_width, image_height, _ = input_image.shape

    x_factor = image_width / INPUT_WIDTH
    y_factor =  image_height / INPUT_HEIGHT

    for r in range(rows):
        row = output_data[r]
        confidence = row[4]
        if confidence >= 0.4:

            classes_scores = row[5:]
            _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
            class_id = max_indx[1]
            if (classes_scores[class_id] > .25):

                confidences.append(confidence)

                class_ids.append(class_id)

                x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item() 
                left = int((x - 0.5 * w) * x_factor)
                top = int((y - 0.5 * h) * y_factor)
                width = int(w * x_factor)
                height = int(h * y_factor)
                box = np.array([left, top, width, height])
                boxes.append(box)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45) 

    result_class_ids = []
    result_confidences = []
    result_boxes = []

    for i in indexes:
        result_confidences.append(confidences[i])
        result_class_ids.append(class_ids[i])
        result_boxes.append(boxes[i])

    return result_class_ids, result_confidences, result_boxes

def format_yolov5(frame):

    row, col, _ = frame.shape
    _max = max(col, row)
    result = np.zeros((_max, _max, 3), np.uint8)
    result[0:row, 0:col] = frame
    return result

colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]

is_cuda = len(sys.argv) > 1 and sys.argv[1] == "cuda"

net = build_model(is_cuda)
capture = load_capture()

start = time.time_ns()
frame_count = 0
total_frames = 0
fps = -1

while True:

    _, frame = capture.read()
    if frame is None:
        print("End of stream")
        break

    inputImage = format_yolov5(frame)
    outs = detect(inputImage, net)

    class_ids, confidences, boxes = wrap_detection(inputImage, outs[0])

    frame_count += 1
    total_frames += 1

    for (classid, confidence, box) in zip(class_ids, confidences, boxes):
         color = colors[int(classid) % len(colors)]
         cv2.rectangle(frame, box, color, 2)
         cv2.rectangle(frame, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
         cv2.putText(frame, class_list[classid], (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, .5, (0,0,0))

    if frame_count >= 30:
        end = time.time_ns()
        fps = 1000000000 * frame_count / (end - start)
        frame_count = 0
        start = time.time_ns()

    if fps > 0:
        fps_label = "FPS: %.2f" % fps
        cv2.putText(frame, fps_label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

    cv2.imshow("output", frame)

    if cv2.waitKey(1) > -1:
        print("finished by user")
        break

print("Total frames: " + str(total_frames))

C++ version:

#include <fstream>

#include <opencv2/opencv.hpp>

std::vector<std::string> load_class_list()
{
    std::vector<std::string> class_list;
    std::ifstream ifs("config_files/classes.txt");
    std::string line;
    while (getline(ifs, line))
    {
        class_list.push_back(line);
    }
    return class_list;
}

void load_net(cv::dnn::Net &net, bool is_cuda)
{
    auto result = cv::dnn::readNet("config_files/yolov5s.onnx");
    if (is_cuda)
    {
        std::cout << "Attempty to use CUDA\n";
        result.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
        result.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA_FP16);
    }
    else
    {
        std::cout << "Running on CPU\n";
        result.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
        result.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    }
    net = result;
}

const std::vector<cv::Scalar> colors = {cv::Scalar(255, 255, 0), cv::Scalar(0, 255, 0), cv::Scalar(0, 255, 255), cv::Scalar(255, 0, 0)};

const float INPUT_WIDTH = 640.0;
const float INPUT_HEIGHT = 640.0;
const float SCORE_THRESHOLD = 0.2;
const float NMS_THRESHOLD = 0.4;
const float CONFIDENCE_THRESHOLD = 0.4;

struct Detection
{
    int class_id;
    float confidence;
    cv::Rect box;
};

cv::Mat format_yolov5(const cv::Mat &source) {
    int col = source.cols;
    int row = source.rows;
    int _max = MAX(col, row);
    cv::Mat result = cv::Mat::zeros(_max, _max, CV_8UC3);
    source.copyTo(result(cv::Rect(0, 0, col, row)));
    return result;
}

void detect(cv::Mat &image, cv::dnn::Net &net, std::vector<Detection> &output, const std::vector<std::string> &className) {
    cv::Mat blob;

    auto input_image = format_yolov5(image);

    cv::dnn::blobFromImage(input_image, blob, 1./255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false);
    net.setInput(blob);
    std::vector<cv::Mat> outputs;
    net.forward(outputs, net.getUnconnectedOutLayersNames());

    float x_factor = input_image.cols / INPUT_WIDTH;
    float y_factor = input_image.rows / INPUT_HEIGHT;

    float *data = (float *)outputs[0].data;

    const int dimensions = 85;
    const int rows = 25200;

    std::vector<int> class_ids;
    std::vector<float> confidences;
    std::vector<cv::Rect> boxes;

    for (int i = 0; i < rows; ++i) {

        float confidence = data[4];
        if (confidence >= CONFIDENCE_THRESHOLD) {

            float * classes_scores = data + 5;
            cv::Mat scores(1, className.size(), CV_32FC1, classes_scores);
            cv::Point class_id;
            double max_class_score;
            minMaxLoc(scores, 0, &max_class_score, 0, &class_id);
            if (max_class_score > SCORE_THRESHOLD) {

                confidences.push_back(confidence);

                class_ids.push_back(class_id.x);

                float x = data[0];
                float y = data[1];
                float w = data[2];
                float h = data[3];
                int left = int((x - 0.5 * w) * x_factor);
                int top = int((y - 0.5 * h) * y_factor);
                int width = int(w * x_factor);
                int height = int(h * y_factor);
                boxes.push_back(cv::Rect(left, top, width, height));
            }

        }

        data += 85;

    }

    std::vector<int> nms_result;
    cv::dnn::NMSBoxes(boxes, confidences, SCORE_THRESHOLD, NMS_THRESHOLD, nms_result);
    for (int i = 0; i < nms_result.size(); i++) {
        int idx = nms_result[i];
        Detection result;
        result.class_id = class_ids[idx];
        result.confidence = confidences[idx];
        result.box = boxes[idx];
        output.push_back(result);
    }
}

int main(int argc, char **argv)
{

    std::vector<std::string> class_list = load_class_list();

    cv::Mat frame;
    cv::VideoCapture capture("sample.mp4");
    if (!capture.isOpened())
    {
        std::cerr << "Error opening video file\n";
        return -1;
    }

    bool is_cuda = argc > 1 && strcmp(argv[1], "cuda") == 0;

    cv::dnn::Net net;
    load_net(net, is_cuda);

    auto start = std::chrono::high_resolution_clock::now();
    int frame_count = 0;
    float fps = -1;
    int total_frames = 0;

    while (true)
    {
        capture.read(frame);
        if (frame.empty())
        {
            std::cout << "End of stream\n";
            break;
        }

        std::vector<Detection> output;
        detect(frame, net, output, class_list);

        frame_count++;
        total_frames++;

        int detections = output.size();

        for (int i = 0; i < detections; ++i)
        {

            auto detection = output[i];
            auto box = detection.box;
            auto classId = detection.class_id;
            const auto color = colors[classId % colors.size()];
            cv::rectangle(frame, box, color, 3);

            cv::rectangle(frame, cv::Point(box.x, box.y - 20), cv::Point(box.x + box.width, box.y), color, cv::FILLED);
            cv::putText(frame, class_list[classId].c_str(), cv::Point(box.x, box.y - 5), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));
        }

        if (frame_count >= 30)
        {

            auto end = std::chrono::high_resolution_clock::now();
            fps = frame_count * 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

            frame_count = 0;
            start = std::chrono::high_resolution_clock::now();
        }

        if (fps > 0)
        {

            std::ostringstream fps_label;
            fps_label << std::fixed << std::setprecision(2);
            fps_label << "FPS: " << fps;
            std::string fps_label_str = fps_label.str();

            cv::putText(frame, fps_label_str.c_str(), cv::Point(10, 25), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 0, 255), 2);
        }

        cv::imshow("output", frame);

        if (cv::waitKey(1) != -1)
        {
            capture.release();
            std::cout << "finished by user\n";
            break;
        }
    }

    std::cout << "Total frames: " << total_frames << "\n";

    return 0;
}

More details can be find in this repository: https://github.com/doleron/yolov5-opencv-cpp-python

glenn-jocher commented 2 years ago

@doleron thanks for the examples! I've added a link to your repo on the export tutorial in https://docs.ultralytics.com/yolov5/tutorials/model_export

jebastin-nadar commented 2 years ago

@doleron I think YOLOv5 expects inputs in [0, 1] without any mean subtraction, just dividing by 255 should be enough.

blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
doleron commented 2 years ago

@doleron I think YOLOv5 expects inputs in [0, 1] without any mean subtraction, just dividing by 255 should be enough.

blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)

@SamFC10 You're right. I just edited the code. Thanks!

alimousavi1377 commented 2 years ago

I have to run yolov5 for my project but I don't know how to run it ?? previously we used opencv to load models , labels and weight but now yolov5 does not support this structure . can everybody help me for it ???

doleron commented 2 years ago

I have to run yolov5 for my project but I don't know how to run it ?? previously we used opencv to load models , labels and weight but now yolov5 does not support this structure . can everybody help me for it ???

@alimousavi1377 YOLOv5 does support this structure. Check https://github.com/ultralytics/yolov5/issues/239#issuecomment-890400768 and https://github.com/ultralytics/yolov5/issues/6309#issuecomment-1019403257 for runnable examples of using YOLOv5 with built-in/custom models. In addition, if you really want to use OpenCV, check the C++/Python example few replies above to learn how to use .onnx files, OpenCV and YOLOv5.

haimat commented 2 years ago

Thanks guys for this thread, helped me a lot. One question though: Any ideas how to use the YOLOv5 augment feature when running ONNX via CV2? Or would I need to manually implement it in my own code then?

doleron commented 2 years ago

Hi @haimat ! As far I understand, talking about data augmentation only makes sense during the training time. Thus, once the model training is finished, the final model structure/topology does not reflect any of the augmentation hyperparametization set for the model training. The unique influence of augmentation is in the dataset preparation in order to achieve a better weight generalization power. In resume, IMO no action must be done on the ONNX conversion or even during the future model usage. PS.: I'm only a YOLO user. Please wait for a more accurate/reliable position from ultralytics team though. PS2: are you facing some specific ONNX conversion error?

glenn-jocher commented 2 years ago

@haimat Test Time Augmentation (TTA) flag --augment is only applied to PyTorch and TorchScript inference: https://github.com/ultralytics/yolov5/blob/1ff43702a8dea05b0d1140d4bb2cf6a2fe3e3ad4/models/common.py#L395-L400

@doleron see TTA tutorial for more info:

YOLOv5 Tutorials

Good luck 🍀 and let us know if you have any other questions!

haimat commented 2 years ago

@glenn-jocher Thanks for you reply, I was expecting something like that. So in other words, if I would like to use CV2+ONNX+TTA I would need to implement the TTA part in my own code, right?

glenn-jocher commented 2 years ago

@haimat well that's an option. The TTA code can also be in the DetectMultiBackend() forward method. It just depends on what level the code is, right now it's at a low level inside the torch and torchvision models.

kXborg commented 2 years ago

Hi all, If you are looking for a thorough analysis and implementation of Yolov5 with OpenCV DNN, check out our LearnOpenCV blog post here.

akbarali2019 commented 2 years ago

@glenn-jocher

python detect.py --weights best.onnx --dnn --source 0

When I use the above command, it is working and detecting on my custom dataset well. The problem is it is showing the class label as a "person". But my custom dataset has only one class and it is labeled as a "ball". How to change it into ball.

ball

glenn-jocher commented 2 years ago

@akbarali2019 for ONNX inference class names are handled automatically. For DNN inference you must pass your --data yaml to detect.py to retrieve class names:

python detect.py --data DATA.yaml
AbinJilson commented 2 years ago

@MohamedAliRashad sorry I've just never used opencv dnn. Can you provide demo code for how this would work ideally? As I said I can provide fully functional exports in all supported formats for COCO and VOC trained YOLOv5 models. What format do you need it in exactly, and how is NMS handled?

@glenn-jocher @PauloMendes33 I use this code to run YOLO V5 with OpenCV DNN:

import cv2
import time
import sys
import numpy as np

def build_model(is_cuda):
    net = cv2.dnn.readNet("config_files/yolov5s.onnx")
    if is_cuda:
        print("Attempty to use CUDA")
        net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
        net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)
    else:
        print("Running on CPU")
        net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
        net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
    return net

INPUT_WIDTH = 640
INPUT_HEIGHT = 640
SCORE_THRESHOLD = 0.2
NMS_THRESHOLD = 0.4
CONFIDENCE_THRESHOLD = 0.4

def detect(image, net):
    blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
    net.setInput(blob)
    preds = net.forward()
    return preds

def load_capture():
    capture = cv2.VideoCapture("sample.mp4")
    return capture

def load_classes():
    class_list = []
    with open("config_files/classes.txt", "r") as f:
        class_list = [cname.strip() for cname in f.readlines()]
    return class_list

class_list = load_classes()

def wrap_detection(input_image, output_data):
    class_ids = []
    confidences = []
    boxes = []

    rows = output_data.shape[0]

    image_width, image_height, _ = input_image.shape

    x_factor = image_width / INPUT_WIDTH
    y_factor =  image_height / INPUT_HEIGHT

    for r in range(rows):
        row = output_data[r]
        confidence = row[4]
        if confidence >= 0.4:

            classes_scores = row[5:]
            _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
            class_id = max_indx[1]
            if (classes_scores[class_id] > .25):

                confidences.append(confidence)

                class_ids.append(class_id)

                x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item() 
                left = int((x - 0.5 * w) * x_factor)
                top = int((y - 0.5 * h) * y_factor)
                width = int(w * x_factor)
                height = int(h * y_factor)
                box = np.array([left, top, width, height])
                boxes.append(box)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45) 

    result_class_ids = []
    result_confidences = []
    result_boxes = []

    for i in indexes:
        result_confidences.append(confidences[i])
        result_class_ids.append(class_ids[i])
        result_boxes.append(boxes[i])

    return result_class_ids, result_confidences, result_boxes

def format_yolov5(frame):

    row, col, _ = frame.shape
    _max = max(col, row)
    result = np.zeros((_max, _max, 3), np.uint8)
    result[0:row, 0:col] = frame
    return result

colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]

is_cuda = len(sys.argv) > 1 and sys.argv[1] == "cuda"

net = build_model(is_cuda)
capture = load_capture()

start = time.time_ns()
frame_count = 0
total_frames = 0
fps = -1

while True:

    _, frame = capture.read()
    if frame is None:
        print("End of stream")
        break

    inputImage = format_yolov5(frame)
    outs = detect(inputImage, net)

    class_ids, confidences, boxes = wrap_detection(inputImage, outs[0])

    frame_count += 1
    total_frames += 1

    for (classid, confidence, box) in zip(class_ids, confidences, boxes):
         color = colors[int(classid) % len(colors)]
         cv2.rectangle(frame, box, color, 2)
         cv2.rectangle(frame, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
         cv2.putText(frame, class_list[classid], (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, .5, (0,0,0))

    if frame_count >= 30:
        end = time.time_ns()
        fps = 1000000000 * frame_count / (end - start)
        frame_count = 0
        start = time.time_ns()

    if fps > 0:
        fps_label = "FPS: %.2f" % fps
        cv2.putText(frame, fps_label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

    cv2.imshow("output", frame)

    if cv2.waitKey(1) > -1:
        print("finished by user")
        break

print("Total frames: " + str(total_frames))

C++ version:

#include <fstream>

#include <opencv2/opencv.hpp>

std::vector<std::string> load_class_list()
{
    std::vector<std::string> class_list;
    std::ifstream ifs("config_files/classes.txt");
    std::string line;
    while (getline(ifs, line))
    {
        class_list.push_back(line);
    }
    return class_list;
}

void load_net(cv::dnn::Net &net, bool is_cuda)
{
    auto result = cv::dnn::readNet("config_files/yolov5s.onnx");
    if (is_cuda)
    {
        std::cout << "Attempty to use CUDA\n";
        result.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
        result.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA_FP16);
    }
    else
    {
        std::cout << "Running on CPU\n";
        result.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
        result.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
    }
    net = result;
}

const std::vector<cv::Scalar> colors = {cv::Scalar(255, 255, 0), cv::Scalar(0, 255, 0), cv::Scalar(0, 255, 255), cv::Scalar(255, 0, 0)};

const float INPUT_WIDTH = 640.0;
const float INPUT_HEIGHT = 640.0;
const float SCORE_THRESHOLD = 0.2;
const float NMS_THRESHOLD = 0.4;
const float CONFIDENCE_THRESHOLD = 0.4;

struct Detection
{
    int class_id;
    float confidence;
    cv::Rect box;
};

cv::Mat format_yolov5(const cv::Mat &source) {
    int col = source.cols;
    int row = source.rows;
    int _max = MAX(col, row);
    cv::Mat result = cv::Mat::zeros(_max, _max, CV_8UC3);
    source.copyTo(result(cv::Rect(0, 0, col, row)));
    return result;
}

void detect(cv::Mat &image, cv::dnn::Net &net, std::vector<Detection> &output, const std::vector<std::string> &className) {
    cv::Mat blob;

    auto input_image = format_yolov5(image);

    cv::dnn::blobFromImage(input_image, blob, 1./255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false);
    net.setInput(blob);
    std::vector<cv::Mat> outputs;
    net.forward(outputs, net.getUnconnectedOutLayersNames());

    float x_factor = input_image.cols / INPUT_WIDTH;
    float y_factor = input_image.rows / INPUT_HEIGHT;

    float *data = (float *)outputs[0].data;

    const int dimensions = 85;
    const int rows = 25200;

    std::vector<int> class_ids;
    std::vector<float> confidences;
    std::vector<cv::Rect> boxes;

    for (int i = 0; i < rows; ++i) {

        float confidence = data[4];
        if (confidence >= CONFIDENCE_THRESHOLD) {

            float * classes_scores = data + 5;
            cv::Mat scores(1, className.size(), CV_32FC1, classes_scores);
            cv::Point class_id;
            double max_class_score;
            minMaxLoc(scores, 0, &max_class_score, 0, &class_id);
            if (max_class_score > SCORE_THRESHOLD) {

                confidences.push_back(confidence);

                class_ids.push_back(class_id.x);

                float x = data[0];
                float y = data[1];
                float w = data[2];
                float h = data[3];
                int left = int((x - 0.5 * w) * x_factor);
                int top = int((y - 0.5 * h) * y_factor);
                int width = int(w * x_factor);
                int height = int(h * y_factor);
                boxes.push_back(cv::Rect(left, top, width, height));
            }

        }

        data += 85;

    }

    std::vector<int> nms_result;
    cv::dnn::NMSBoxes(boxes, confidences, SCORE_THRESHOLD, NMS_THRESHOLD, nms_result);
    for (int i = 0; i < nms_result.size(); i++) {
        int idx = nms_result[i];
        Detection result;
        result.class_id = class_ids[idx];
        result.confidence = confidences[idx];
        result.box = boxes[idx];
        output.push_back(result);
    }
}

int main(int argc, char **argv)
{

    std::vector<std::string> class_list = load_class_list();

    cv::Mat frame;
    cv::VideoCapture capture("sample.mp4");
    if (!capture.isOpened())
    {
        std::cerr << "Error opening video file\n";
        return -1;
    }

    bool is_cuda = argc > 1 && strcmp(argv[1], "cuda") == 0;

    cv::dnn::Net net;
    load_net(net, is_cuda);

    auto start = std::chrono::high_resolution_clock::now();
    int frame_count = 0;
    float fps = -1;
    int total_frames = 0;

    while (true)
    {
        capture.read(frame);
        if (frame.empty())
        {
            std::cout << "End of stream\n";
            break;
        }

        std::vector<Detection> output;
        detect(frame, net, output, class_list);

        frame_count++;
        total_frames++;

        int detections = output.size();

        for (int i = 0; i < detections; ++i)
        {

            auto detection = output[i];
            auto box = detection.box;
            auto classId = detection.class_id;
            const auto color = colors[classId % colors.size()];
            cv::rectangle(frame, box, color, 3);

            cv::rectangle(frame, cv::Point(box.x, box.y - 20), cv::Point(box.x + box.width, box.y), color, cv::FILLED);
            cv::putText(frame, class_list[classId].c_str(), cv::Point(box.x, box.y - 5), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));
        }

        if (frame_count >= 30)
        {

            auto end = std::chrono::high_resolution_clock::now();
            fps = frame_count * 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

            frame_count = 0;
            start = std::chrono::high_resolution_clock::now();
        }

        if (fps > 0)
        {

            std::ostringstream fps_label;
            fps_label << std::fixed << std::setprecision(2);
            fps_label << "FPS: " << fps;
            std::string fps_label_str = fps_label.str();

            cv::putText(frame, fps_label_str.c_str(), cv::Point(10, 25), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 0, 255), 2);
        }

        cv::imshow("output", frame);

        if (cv::waitKey(1) != -1)
        {
            capture.release();
            std::cout << "finished by user\n";
            break;
        }
    }

    std::cout << "Total frames: " << total_frames << "\n";

    return 0;
}

More details can be find in this repository: https://github.com/doleron/yolov5-opencv-cpp-python

when I run this code in my own custom onnx file I'm getting this error:

  File "C:\Users\acer\.spyder-py3\metallic surface defect detection\untitled3.py", line 57, in wrap_detection
    if confidence >= 0.4:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

anybody help me to fix this?

alkhalisy commented 2 years ago

def wrap_detection(input_image, output_data): class_ids = [] confidences = [] boxes = []

rows = output_data.shape[0]

image_width, image_height, _ = input_image.shape

x_factor = image_width / INPUT_WIDTH
y_factor = image_height / INPUT_HEIGHT

for r in range(rows):
    row = output_data[r]
    confidence = row[4]
    if confidence >= 0.4:

        classes_scores = row[5:]
        _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
        class_id = max_indx[1]
        if (classes_scores[class_id] > .25):
            confidences.append(confidence)

            class_ids.append(class_id)

            x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item()
            left = int((x - 0.5 * w) * x_factor)
            top = int((y - 0.5 * h) * y_factor)
            width = int(w * x_factor)
            height = int(h * y_factor)
            box = np.array([left, top, width, height])
            boxes.append(box)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45)

result_class_ids = []
result_confidences = []
result_boxes = []

for i in indexes:
    result_confidences.append(confidences[i])
    result_class_ids.append(class_ids[i])
    result_boxes.append(boxes[i])

return result_class_ids, result_confidences, result_boxes

Return error Traceback (most recent call last): File "H:\workspace\my_phd_project\yolov5live_opencv_DNN_onnx\yolov5_opencv DNN onnx_5.py", line 126, in class_ids, confidences, boxes = wrap_detection(inputImage, outs[0]) File "H:\workspace\my_phd_project\yolov5live_opencv_DNN_onnx\yolov5_opencv DNN onnx_5.py", line 64, in wrap_detection if confidence >= 0.4: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

PLS what the problem ??

kXborg commented 2 years ago

@alkhalisy,

Just check the shape of the outs once.

In my case, I had to format the code the following way. Checkout Source.

def post_process(input_image, outputs):
      # Lists to hold respective values while unwrapping.
      class_ids = []
      confidences = []
      boxes = []
      # Rows.
      rows = outputs[0].shape[1]
      image_height, image_width = input_image.shape[:2]
      # Resizing factor.
      x_factor = image_width / INPUT_WIDTH
      y_factor =  image_height / INPUT_HEIGHT
      # Iterate through detections.
      for r in range(rows):
            row = outputs[0][0][r]
            confidence = row[4]
            # Discard bad detections and continue.
            if confidence >= CONFIDENCE_THRESHOLD:
                  classes_scores = row[5:]
                  # Get the index of max class score.
                  class_id = np.argmax(classes_scores)
                  #  Continue if the class score is above threshold.
                  if (classes_scores[class_id] > SCORE_THRESHOLD):
                        confidences.append(confidence)
                        class_ids.append(class_id)
                        cx, cy, w, h = row[0], row[1], row[2], row[3]
                        left = int((cx - w/2) * x_factor)
                        top = int((cy - h/2) * y_factor)
                        width = int(w * x_factor)
                        height = int(h * y_factor)
                        box = np.array([left, top, width, height])
                        boxes.append(box)
alkhalisy commented 2 years ago

image image image image image

dear Kukil thanks for your response the code from your repository ....above is the format for the function but really the same error happens "if confidence >= CONFIDENCE_THRESHOLD: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() " so PLS can help me by this