Closed chaUAV closed 4 years ago
Yes @chaUAV it is possible, you need to export it using https://github.com/ultralytics/yolov5/blob/master/models/export.py, inside the file there is an usage example, then the model will be exported as an ONNX model and it can be imported in OpenCV using cv2.dnn.readNetFromONNX(model_path)
Or at least this is the supposed way, I found this issue doing that: #250
@chaUAV @edurenye I've added a pinned documentation issue now at the top of https://github.com/ultralytics/yolov5/issues for this, hopefully this will help everyone to understand the basic functionality.
The INT64's remain one mystery among many in the export process though.
Thanks @glenn-jocher I think it's the labels but I need to test, and I'm having some problems with Docker and me trying to update the nvidia drivers to 450, so it might take me a while.
Thanks you guys, I had already export to onnx and use it with opencv but I got the same error as #250 is there anythings i could do to fix it? @glenn-jocher
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue needs to be reopened @glenn-jocher we need an alternative way to the onnx method.
@MohamedAliRashad sure. Are you trying to export an official YOLOv5 model for use with opencv? I can provide versions of these in ONNX format with outputs structured correctly, but they will lack NMS functionality. Is there a way to append an NMS module in ONNX?
@glenn-jocher
I was thinking about readNetFromDarkNet
like the previous versions of YOLO
@MohamedAliRashad sorry I've just never used opencv dnn. Can you provide demo code for how this would work ideally? As I said I can provide fully functional exports in all supported formats for COCO and VOC trained YOLOv5 models. What format do you need it in exactly, and how is NMS handled?
@glenn-jocher it's quite simple actually.
First, you read the model weights and configuration to construct the network
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
then, we infer an input
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
detections = net.forward(ln)
And finally we run thresholds for filtration with a code like this
boxes = []
confidences = []
classIDs = []
for output in detections:
for detection in output:
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
if confidence > args["confidence"]:
# W, H are the dimensions of the input image
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
boxes.append([x, y, int(width), int(height)])
confidences.append(float(confidence))
classIDs.append(classID)
idxs = cv2.dnn.NMSBoxes(boxes, confidences, confidence, threshold)
Has anyone done it properly? Using opencv-dnn to inference yolov5 models ... ? Is there any guide ?
@MohamedAliRashad sure. Are you trying to export an official YOLOv5 model for use with opencv? I can provide versions of these in ONNX format with outputs structured correctly, but they will lack NMS functionality. Is there a way to append an NMS module in ONNX?
Hello?
Can I take this onnx model and test it out?
Thank you.
@chaUAV @edurenye I've added a pinned documentation issue now at the top of https://github.com/ultralytics/yolov5/issues for this, hopefully this will help everyone to understand the basic functionality.
The INT64's remain one mystery among many in the export process though.
Hello?
The savings seem to have disappeared from the top over time. Could you tell me the url of the document?
Thank you.
@leeyunhome I've exported a YOLOv5s.onnx model at 640x640 here. It has two outputs, boxes (25200,4), and classes (25200,80). https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx
@leeyunhome I've exported a YOLOv5s.onnx model at 640x640 here. It has two outputs, boxes (25200,4), and classes (25200,80). https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx
Thank you for answer. I have an additional question.
Where did you change from the original repo to produce this output?
I would like to know the contents of the torch.onnx.export function used for this conversion.
You should be able to interpret the contents of the output tensor. I don't know how the two outputs, boxes (25200,4), and classes (25200,80) are boxes and classes. Can you tell me what I need to study in this regard? I guess 80 in classes(25200, 80) is the number of classes I have in a file like coco.names, but I don't know about 25200.
Thank you
@leeyunhome this is an optimized ONNX model that we create using a private repo (ultralytics/yolov5-export). It's part of our paid product offerings. It works well for fixed output shapes, i.e. if you want an ONNX model to view 720p webcam streams.
25200 is the number of output points from a 640x640 image. You pass these through NMS to get your detections.
Does anyone have a prepared notebook on yolov5 with OpenCV for live stream..? I am about the drop the plan to use yolov5 :) due to this.
@vishal-nasre YOLOv5 runs inference out of the box on a variety of sources including remote streams (RTSP, HTTP etc.) and local webcams. See https://github.com/ultralytics/yolov5#quick-start-examples for details.
@edurenye @chaUAV @MohamedAliRashad @a954217436 @leeyunhome good news 😃! Your original issue may now be fixed ✅ in PR #4833 by @SamFC10. This PR implements architecture updates to allow for ONNX-exported YOLOv5 models to be used with OpenCV DNN.
To receive this update:
git pull
from within your yolov5/
directory or git clone https://github.com/ultralytics/yolov5
againmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
sudo docker pull ultralytics/yolov5:latest
to update your image Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!
Good to hear this update @glenn-jocher. Can you brief out steps involved one last time like How to export the ONNX model, Is there any additional changes we need to do of Open-CV compatibility.
@edurenye @chaUAV @MohamedAliRashad @a954217436 @leeyunhome steps for OpenCV DNN inference:
# Export to ONNX
python export.py --weights yolov5s.pt --include onnx --simplify
# Inference
python detect.py --weights yolov5s.onnx # ONNX Runtime inference
# -- or --
python detect.py --weights yolov5s.onnx --dnn # OpenCV DNN inference
Has anyone implemented inference through webcam & OpenCV using exported onnx model ?🤔
I knew python detect.py --weights yolov5s.onnx --dnn
is for inference but I'm trying to implement something in real-time from webcam. It would be really helpful, if anyone can share the OpenCV-webcam implementation of exported ONNX model.
@snehitvaddi read the README
As it appears presently, OpenCV dnn does not support layers from Yolov5. Only in the future it will be supported, but when?
@PauloMendes33 YOLOv5 inference with DNN is super easy:
python export.py --weights yolov5s.pt --include onnx
python detect.py --weights yolov5s.onnx --dnn
When I run the proposed python scripts I'm getting the following error:
Traceback (most recent call last):
File "detect.py", line 243, in
Node [Unsqueeze]:(339) parse error: OpenCV(4.5.3-openvino) ../opencv/modules/dnn/src/onnx/onnx_importer.cpp:1551: error: (-215:Assertion failed) node_proto.input_size() == 1 in function 'handleNode'
Im new to object detection so I'ts probably a stupid mistake on my end, but I have no clue how to resolve this
During the creating of the onnx model I'm also getting the following warning so I'm not sure if this is related:
ONNX: starting export with onnx 1.10.2... /Users/vascowerk/visaTorch/yolov5/models/yolo.py:57: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]: WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. ONNX: export success, saved as yolov5s.onnx (29.3 MB) ONNX: run --dynamic ONNX model inference with: 'python detect.py --weights yolov5s.onnx'
@alexanderaltena DNN requires opencv > 4.5.4. Everything works correctly.
Man I'd kiss you if I could thnx!!
@alexanderaltena What are the news about it? Do you were able to make inferences using openCV 4.5.4?
Presently I am installing opencv 4.5.4 to test the software, after I will tell something.
@alexanderaltena, Are you inferencing through command-line or using exported ONNX model in python code?
I am able to read the .onnx model (using cv::dnn::readNetFrom ONNX("yolov5.onnx")) however I am unable to make predictions in the image. The image output is a downscaled image without any prediction, this is, using the code provided in my last comment.
The code is as it follows.
`/*
/*
using namespace std;
// YOLO
constexpr float CONFIDENCE_THRESHOLD = 0; constexpr float NMS_THRESHOLD = 0.4; //number of classes to detect //constexpr int NUM_CLASSES = 80; constexpr int NUM_CLASSES = 5;// to detect only one class -> the first in the coco_names_txt file list ?!?? // colors for bounding boxes const cv::Scalar colors[] = { {0, 255, 255}, {255, 255, 0}, {0, 255, 0}, {255, 0, 0} }; const auto NUM_COLORS = sizeof(colors)/sizeof(colors[0]); // /*
*/ int main(int argc, char** argv) { cout << CV_VERSION<< endl; cv::Mat im_1;
im_1 = cv::imread("im_14_RGB.jpg", cv::IMREAD_COLOR); if(!im_1.data){ cout << "\n\t Could not open or find the image 1" << endl; } // let's downscale the image using new width and height int down_width = 640; int down_height = 640; //resize down cv::resize(im_1, im_1, cv::Size(down_width, down_height), cv::INTER_LINEAR);
// YOLO V5
// read coco class names do ficheiro .txt
std::vector
std::string line;
while (std::getline(class_file, line))
class_names.push_back(line);
}
// Initialize the parameters para alocação de memoria for object detection using YOLOV4
// faço load dos ficheiros de configuração do método YOLOV4
//auto net = cv::dnn::readNetFromDarknet("custom-yolov4-detector.cfg", "custom-yolov4-detector_best.weights");
//auto net = cv::dnn::readNetFromDarknet("yolov4.cfg", "custom-yolov4-tiny-detector_best.weights");
//cv::dnn::Net net = cv::dnn::readNetFromONNX("best.onnx");
auto net = cv::dnn::readNetFromONNX("yolov5.onnx");
cout << "here" << endl;
// using GPU for image processing
//net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
//net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
// using CPU for image processing
net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
auto output_names = net.getUnconnectedOutLayersNames();
cv::Mat blob;
std::vector
// Creates 4-dimensional blob from image.
cv::dnn::blobFromImage(im_1, blob, 0.00392, cv::Size(im_1.rows, im_1.cols), cv::Scalar(), true, false, CV_32F);
net.setInput(blob);
net.forward(detections, output_names);
// object detection using YOLOV4
for (auto& output : detections){
const auto num_boxes = output.rows;
for (int i = 0; i < num_boxes; i++){
//calculo das 5 predições para cada bounding box: x, y, w, h , confiança
auto x = output.at
for (int c = 0; c < NUM_CLASSES; c++){
auto confidence = *output.ptr<float>(i, 5 + c);
if (confidence >= CONFIDENCE_THRESHOLD){
boxes[c].push_back(rect);
scores[c].push_back(confidence);
}
}
}
} // Realiza a supressão não máxima das bounding boxes e das pontuações de confiança correspondentes. // eliminação de bounding boxes repetidas que identificam o mesmo objecto. for (int c = 0; c < NUM_CLASSES; c++) cv::dnn::NMSBoxes(boxes[c], scores[c], 0.0, NMS_THRESHOLD, indices[c]);
// identificação dos objectos e correspondentes pontuações de confiança através de bounding boxes. for (int c= 0; c < NUM_CLASSES; c++){ for (size_t i = 0; i < indices[c].size(); ++i){ const auto color = colors[c % NUM_COLORS];
auto idx = indices[c][i];
const auto& rect = boxes[c][idx];
cv::rectangle(im_1, cv::Point(rect.x, rect.y), cv::Point(rect.x + rect.width, rect.y + rect.height), color, 3);
// coloco a identificação da classe do objeto contido na bounding box - pedestre ou garrafa por ex.
std::ostringstream label_ss;
label_ss << class_names[c] << ": " << std::fixed << std::setprecision(2) << scores[c][idx];
auto label = label_ss.str();
int baseline;
auto label_bg_sz = cv::getTextSize(label.c_str(), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, 1, &baseline);
// defino o rectangulo que define o objeto detectado
cv::rectangle(im_1, cv::Point(rect.x, rect.y - label_bg_sz.height - baseline - 10), cv::Point(rect.x + label_bg_sz.width, rect.y), color, cv::FILLED);
// coloco a identificação da classe do objecto detectado.
cv::putText(im_1, label.c_str(), cv::Point(rect.x, rect.y - baseline - 5), cv::FONT_HERSHEY_COMPLEX_SMALL, 1, cv::Scalar(0, 0, 0));
}
} cv::namedWindow("YOLOV5 detection", cv::WINDOW_NORMAL); cv::imshow("YOLOV5 detection", im_1); cv::waitKey(0); cv::imwrite("YOLOV5_res.jpg", im_1);
return 0; }
`
@MohamedAliRashad sorry I've just never used opencv dnn. Can you provide demo code for how this would work ideally? As I said I can provide fully functional exports in all supported formats for COCO and VOC trained YOLOv5 models. What format do you need it in exactly, and how is NMS handled?
@glenn-jocher @PauloMendes33 I use this code to run YOLO V5 with OpenCV DNN:
import cv2
import time
import sys
import numpy as np
def build_model(is_cuda):
net = cv2.dnn.readNet("config_files/yolov5s.onnx")
if is_cuda:
print("Attempty to use CUDA")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)
else:
print("Running on CPU")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
return net
INPUT_WIDTH = 640
INPUT_HEIGHT = 640
SCORE_THRESHOLD = 0.2
NMS_THRESHOLD = 0.4
CONFIDENCE_THRESHOLD = 0.4
def detect(image, net):
blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
net.setInput(blob)
preds = net.forward()
return preds
def load_capture():
capture = cv2.VideoCapture("sample.mp4")
return capture
def load_classes():
class_list = []
with open("config_files/classes.txt", "r") as f:
class_list = [cname.strip() for cname in f.readlines()]
return class_list
class_list = load_classes()
def wrap_detection(input_image, output_data):
class_ids = []
confidences = []
boxes = []
rows = output_data.shape[0]
image_width, image_height, _ = input_image.shape
x_factor = image_width / INPUT_WIDTH
y_factor = image_height / INPUT_HEIGHT
for r in range(rows):
row = output_data[r]
confidence = row[4]
if confidence >= 0.4:
classes_scores = row[5:]
_, _, _, max_indx = cv2.minMaxLoc(classes_scores)
class_id = max_indx[1]
if (classes_scores[class_id] > .25):
confidences.append(confidence)
class_ids.append(class_id)
x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item()
left = int((x - 0.5 * w) * x_factor)
top = int((y - 0.5 * h) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
box = np.array([left, top, width, height])
boxes.append(box)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45)
result_class_ids = []
result_confidences = []
result_boxes = []
for i in indexes:
result_confidences.append(confidences[i])
result_class_ids.append(class_ids[i])
result_boxes.append(boxes[i])
return result_class_ids, result_confidences, result_boxes
def format_yolov5(frame):
row, col, _ = frame.shape
_max = max(col, row)
result = np.zeros((_max, _max, 3), np.uint8)
result[0:row, 0:col] = frame
return result
colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]
is_cuda = len(sys.argv) > 1 and sys.argv[1] == "cuda"
net = build_model(is_cuda)
capture = load_capture()
start = time.time_ns()
frame_count = 0
total_frames = 0
fps = -1
while True:
_, frame = capture.read()
if frame is None:
print("End of stream")
break
inputImage = format_yolov5(frame)
outs = detect(inputImage, net)
class_ids, confidences, boxes = wrap_detection(inputImage, outs[0])
frame_count += 1
total_frames += 1
for (classid, confidence, box) in zip(class_ids, confidences, boxes):
color = colors[int(classid) % len(colors)]
cv2.rectangle(frame, box, color, 2)
cv2.rectangle(frame, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
cv2.putText(frame, class_list[classid], (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, .5, (0,0,0))
if frame_count >= 30:
end = time.time_ns()
fps = 1000000000 * frame_count / (end - start)
frame_count = 0
start = time.time_ns()
if fps > 0:
fps_label = "FPS: %.2f" % fps
cv2.putText(frame, fps_label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.imshow("output", frame)
if cv2.waitKey(1) > -1:
print("finished by user")
break
print("Total frames: " + str(total_frames))
C++ version:
#include <fstream>
#include <opencv2/opencv.hpp>
std::vector<std::string> load_class_list()
{
std::vector<std::string> class_list;
std::ifstream ifs("config_files/classes.txt");
std::string line;
while (getline(ifs, line))
{
class_list.push_back(line);
}
return class_list;
}
void load_net(cv::dnn::Net &net, bool is_cuda)
{
auto result = cv::dnn::readNet("config_files/yolov5s.onnx");
if (is_cuda)
{
std::cout << "Attempty to use CUDA\n";
result.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
result.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA_FP16);
}
else
{
std::cout << "Running on CPU\n";
result.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
result.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
}
net = result;
}
const std::vector<cv::Scalar> colors = {cv::Scalar(255, 255, 0), cv::Scalar(0, 255, 0), cv::Scalar(0, 255, 255), cv::Scalar(255, 0, 0)};
const float INPUT_WIDTH = 640.0;
const float INPUT_HEIGHT = 640.0;
const float SCORE_THRESHOLD = 0.2;
const float NMS_THRESHOLD = 0.4;
const float CONFIDENCE_THRESHOLD = 0.4;
struct Detection
{
int class_id;
float confidence;
cv::Rect box;
};
cv::Mat format_yolov5(const cv::Mat &source) {
int col = source.cols;
int row = source.rows;
int _max = MAX(col, row);
cv::Mat result = cv::Mat::zeros(_max, _max, CV_8UC3);
source.copyTo(result(cv::Rect(0, 0, col, row)));
return result;
}
void detect(cv::Mat &image, cv::dnn::Net &net, std::vector<Detection> &output, const std::vector<std::string> &className) {
cv::Mat blob;
auto input_image = format_yolov5(image);
cv::dnn::blobFromImage(input_image, blob, 1./255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false);
net.setInput(blob);
std::vector<cv::Mat> outputs;
net.forward(outputs, net.getUnconnectedOutLayersNames());
float x_factor = input_image.cols / INPUT_WIDTH;
float y_factor = input_image.rows / INPUT_HEIGHT;
float *data = (float *)outputs[0].data;
const int dimensions = 85;
const int rows = 25200;
std::vector<int> class_ids;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;
for (int i = 0; i < rows; ++i) {
float confidence = data[4];
if (confidence >= CONFIDENCE_THRESHOLD) {
float * classes_scores = data + 5;
cv::Mat scores(1, className.size(), CV_32FC1, classes_scores);
cv::Point class_id;
double max_class_score;
minMaxLoc(scores, 0, &max_class_score, 0, &class_id);
if (max_class_score > SCORE_THRESHOLD) {
confidences.push_back(confidence);
class_ids.push_back(class_id.x);
float x = data[0];
float y = data[1];
float w = data[2];
float h = data[3];
int left = int((x - 0.5 * w) * x_factor);
int top = int((y - 0.5 * h) * y_factor);
int width = int(w * x_factor);
int height = int(h * y_factor);
boxes.push_back(cv::Rect(left, top, width, height));
}
}
data += 85;
}
std::vector<int> nms_result;
cv::dnn::NMSBoxes(boxes, confidences, SCORE_THRESHOLD, NMS_THRESHOLD, nms_result);
for (int i = 0; i < nms_result.size(); i++) {
int idx = nms_result[i];
Detection result;
result.class_id = class_ids[idx];
result.confidence = confidences[idx];
result.box = boxes[idx];
output.push_back(result);
}
}
int main(int argc, char **argv)
{
std::vector<std::string> class_list = load_class_list();
cv::Mat frame;
cv::VideoCapture capture("sample.mp4");
if (!capture.isOpened())
{
std::cerr << "Error opening video file\n";
return -1;
}
bool is_cuda = argc > 1 && strcmp(argv[1], "cuda") == 0;
cv::dnn::Net net;
load_net(net, is_cuda);
auto start = std::chrono::high_resolution_clock::now();
int frame_count = 0;
float fps = -1;
int total_frames = 0;
while (true)
{
capture.read(frame);
if (frame.empty())
{
std::cout << "End of stream\n";
break;
}
std::vector<Detection> output;
detect(frame, net, output, class_list);
frame_count++;
total_frames++;
int detections = output.size();
for (int i = 0; i < detections; ++i)
{
auto detection = output[i];
auto box = detection.box;
auto classId = detection.class_id;
const auto color = colors[classId % colors.size()];
cv::rectangle(frame, box, color, 3);
cv::rectangle(frame, cv::Point(box.x, box.y - 20), cv::Point(box.x + box.width, box.y), color, cv::FILLED);
cv::putText(frame, class_list[classId].c_str(), cv::Point(box.x, box.y - 5), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));
}
if (frame_count >= 30)
{
auto end = std::chrono::high_resolution_clock::now();
fps = frame_count * 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
frame_count = 0;
start = std::chrono::high_resolution_clock::now();
}
if (fps > 0)
{
std::ostringstream fps_label;
fps_label << std::fixed << std::setprecision(2);
fps_label << "FPS: " << fps;
std::string fps_label_str = fps_label.str();
cv::putText(frame, fps_label_str.c_str(), cv::Point(10, 25), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 0, 255), 2);
}
cv::imshow("output", frame);
if (cv::waitKey(1) != -1)
{
capture.release();
std::cout << "finished by user\n";
break;
}
}
std::cout << "Total frames: " << total_frames << "\n";
return 0;
}
More details can be find in this repository: https://github.com/doleron/yolov5-opencv-cpp-python
@doleron thanks for the examples! I've added a link to your repo on the export tutorial in https://docs.ultralytics.com/yolov5/tutorials/model_export
@doleron I think YOLOv5 expects inputs in [0, 1] without any mean subtraction, just dividing by 255 should be enough.
blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
@doleron I think YOLOv5 expects inputs in [0, 1] without any mean subtraction, just dividing by 255 should be enough.
blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)
@SamFC10 You're right. I just edited the code. Thanks!
I have to run yolov5 for my project but I don't know how to run it ?? previously we used opencv to load models , labels and weight but now yolov5 does not support this structure . can everybody help me for it ???
I have to run yolov5 for my project but I don't know how to run it ?? previously we used opencv to load models , labels and weight but now yolov5 does not support this structure . can everybody help me for it ???
@alimousavi1377 YOLOv5 does support this structure. Check https://github.com/ultralytics/yolov5/issues/239#issuecomment-890400768 and https://github.com/ultralytics/yolov5/issues/6309#issuecomment-1019403257 for runnable examples of using YOLOv5 with built-in/custom models. In addition, if you really want to use OpenCV, check the C++/Python example few replies above to learn how to use .onnx files, OpenCV and YOLOv5.
Thanks guys for this thread, helped me a lot. One question though: Any ideas how to use the YOLOv5 augment
feature when running ONNX via CV2? Or would I need to manually implement it in my own code then?
Hi @haimat ! As far I understand, talking about data augmentation only makes sense during the training time. Thus, once the model training is finished, the final model structure/topology does not reflect any of the augmentation hyperparametization set for the model training. The unique influence of augmentation is in the dataset preparation in order to achieve a better weight generalization power. In resume, IMO no action must be done on the ONNX conversion or even during the future model usage. PS.: I'm only a YOLO user. Please wait for a more accurate/reliable position from ultralytics team though. PS2: are you facing some specific ONNX conversion error?
@haimat Test Time Augmentation (TTA) flag --augment
is only applied to PyTorch and TorchScript inference:
https://github.com/ultralytics/yolov5/blob/1ff43702a8dea05b0d1140d4bb2cf6a2fe3e3ad4/models/common.py#L395-L400
@doleron see TTA tutorial for more info:
Good luck 🍀 and let us know if you have any other questions!
@glenn-jocher Thanks for you reply, I was expecting something like that. So in other words, if I would like to use CV2+ONNX+TTA I would need to implement the TTA part in my own code, right?
@haimat well that's an option. The TTA code can also be in the DetectMultiBackend() forward method. It just depends on what level the code is, right now it's at a low level inside the torch and torchvision models.
Hi all, If you are looking for a thorough analysis and implementation of Yolov5 with OpenCV DNN, check out our LearnOpenCV blog post here.
@glenn-jocher
python detect.py --weights best.onnx --dnn --source 0
When I use the above command, it is working and detecting on my custom dataset well. The problem is it is showing the class label as a "person". But my custom dataset has only one class and it is labeled as a "ball". How to change it into ball.
@akbarali2019 for ONNX inference class names are handled automatically. For DNN inference you must pass your --data yaml to detect.py to retrieve class names:
python detect.py --data DATA.yaml
@MohamedAliRashad sorry I've just never used opencv dnn. Can you provide demo code for how this would work ideally? As I said I can provide fully functional exports in all supported formats for COCO and VOC trained YOLOv5 models. What format do you need it in exactly, and how is NMS handled?
@glenn-jocher @PauloMendes33 I use this code to run YOLO V5 with OpenCV DNN:
import cv2 import time import sys import numpy as np def build_model(is_cuda): net = cv2.dnn.readNet("config_files/yolov5s.onnx") if is_cuda: print("Attempty to use CUDA") net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16) else: print("Running on CPU") net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV) net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU) return net INPUT_WIDTH = 640 INPUT_HEIGHT = 640 SCORE_THRESHOLD = 0.2 NMS_THRESHOLD = 0.4 CONFIDENCE_THRESHOLD = 0.4 def detect(image, net): blob = cv2.dnn.blobFromImage(image, 1/255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False) net.setInput(blob) preds = net.forward() return preds def load_capture(): capture = cv2.VideoCapture("sample.mp4") return capture def load_classes(): class_list = [] with open("config_files/classes.txt", "r") as f: class_list = [cname.strip() for cname in f.readlines()] return class_list class_list = load_classes() def wrap_detection(input_image, output_data): class_ids = [] confidences = [] boxes = [] rows = output_data.shape[0] image_width, image_height, _ = input_image.shape x_factor = image_width / INPUT_WIDTH y_factor = image_height / INPUT_HEIGHT for r in range(rows): row = output_data[r] confidence = row[4] if confidence >= 0.4: classes_scores = row[5:] _, _, _, max_indx = cv2.minMaxLoc(classes_scores) class_id = max_indx[1] if (classes_scores[class_id] > .25): confidences.append(confidence) class_ids.append(class_id) x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item() left = int((x - 0.5 * w) * x_factor) top = int((y - 0.5 * h) * y_factor) width = int(w * x_factor) height = int(h * y_factor) box = np.array([left, top, width, height]) boxes.append(box) indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45) result_class_ids = [] result_confidences = [] result_boxes = [] for i in indexes: result_confidences.append(confidences[i]) result_class_ids.append(class_ids[i]) result_boxes.append(boxes[i]) return result_class_ids, result_confidences, result_boxes def format_yolov5(frame): row, col, _ = frame.shape _max = max(col, row) result = np.zeros((_max, _max, 3), np.uint8) result[0:row, 0:col] = frame return result colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)] is_cuda = len(sys.argv) > 1 and sys.argv[1] == "cuda" net = build_model(is_cuda) capture = load_capture() start = time.time_ns() frame_count = 0 total_frames = 0 fps = -1 while True: _, frame = capture.read() if frame is None: print("End of stream") break inputImage = format_yolov5(frame) outs = detect(inputImage, net) class_ids, confidences, boxes = wrap_detection(inputImage, outs[0]) frame_count += 1 total_frames += 1 for (classid, confidence, box) in zip(class_ids, confidences, boxes): color = colors[int(classid) % len(colors)] cv2.rectangle(frame, box, color, 2) cv2.rectangle(frame, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1) cv2.putText(frame, class_list[classid], (box[0], box[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, .5, (0,0,0)) if frame_count >= 30: end = time.time_ns() fps = 1000000000 * frame_count / (end - start) frame_count = 0 start = time.time_ns() if fps > 0: fps_label = "FPS: %.2f" % fps cv2.putText(frame, fps_label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2) cv2.imshow("output", frame) if cv2.waitKey(1) > -1: print("finished by user") break print("Total frames: " + str(total_frames))
C++ version:
#include <fstream> #include <opencv2/opencv.hpp> std::vector<std::string> load_class_list() { std::vector<std::string> class_list; std::ifstream ifs("config_files/classes.txt"); std::string line; while (getline(ifs, line)) { class_list.push_back(line); } return class_list; } void load_net(cv::dnn::Net &net, bool is_cuda) { auto result = cv::dnn::readNet("config_files/yolov5s.onnx"); if (is_cuda) { std::cout << "Attempty to use CUDA\n"; result.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA); result.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA_FP16); } else { std::cout << "Running on CPU\n"; result.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV); result.setPreferableTarget(cv::dnn::DNN_TARGET_CPU); } net = result; } const std::vector<cv::Scalar> colors = {cv::Scalar(255, 255, 0), cv::Scalar(0, 255, 0), cv::Scalar(0, 255, 255), cv::Scalar(255, 0, 0)}; const float INPUT_WIDTH = 640.0; const float INPUT_HEIGHT = 640.0; const float SCORE_THRESHOLD = 0.2; const float NMS_THRESHOLD = 0.4; const float CONFIDENCE_THRESHOLD = 0.4; struct Detection { int class_id; float confidence; cv::Rect box; }; cv::Mat format_yolov5(const cv::Mat &source) { int col = source.cols; int row = source.rows; int _max = MAX(col, row); cv::Mat result = cv::Mat::zeros(_max, _max, CV_8UC3); source.copyTo(result(cv::Rect(0, 0, col, row))); return result; } void detect(cv::Mat &image, cv::dnn::Net &net, std::vector<Detection> &output, const std::vector<std::string> &className) { cv::Mat blob; auto input_image = format_yolov5(image); cv::dnn::blobFromImage(input_image, blob, 1./255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false); net.setInput(blob); std::vector<cv::Mat> outputs; net.forward(outputs, net.getUnconnectedOutLayersNames()); float x_factor = input_image.cols / INPUT_WIDTH; float y_factor = input_image.rows / INPUT_HEIGHT; float *data = (float *)outputs[0].data; const int dimensions = 85; const int rows = 25200; std::vector<int> class_ids; std::vector<float> confidences; std::vector<cv::Rect> boxes; for (int i = 0; i < rows; ++i) { float confidence = data[4]; if (confidence >= CONFIDENCE_THRESHOLD) { float * classes_scores = data + 5; cv::Mat scores(1, className.size(), CV_32FC1, classes_scores); cv::Point class_id; double max_class_score; minMaxLoc(scores, 0, &max_class_score, 0, &class_id); if (max_class_score > SCORE_THRESHOLD) { confidences.push_back(confidence); class_ids.push_back(class_id.x); float x = data[0]; float y = data[1]; float w = data[2]; float h = data[3]; int left = int((x - 0.5 * w) * x_factor); int top = int((y - 0.5 * h) * y_factor); int width = int(w * x_factor); int height = int(h * y_factor); boxes.push_back(cv::Rect(left, top, width, height)); } } data += 85; } std::vector<int> nms_result; cv::dnn::NMSBoxes(boxes, confidences, SCORE_THRESHOLD, NMS_THRESHOLD, nms_result); for (int i = 0; i < nms_result.size(); i++) { int idx = nms_result[i]; Detection result; result.class_id = class_ids[idx]; result.confidence = confidences[idx]; result.box = boxes[idx]; output.push_back(result); } } int main(int argc, char **argv) { std::vector<std::string> class_list = load_class_list(); cv::Mat frame; cv::VideoCapture capture("sample.mp4"); if (!capture.isOpened()) { std::cerr << "Error opening video file\n"; return -1; } bool is_cuda = argc > 1 && strcmp(argv[1], "cuda") == 0; cv::dnn::Net net; load_net(net, is_cuda); auto start = std::chrono::high_resolution_clock::now(); int frame_count = 0; float fps = -1; int total_frames = 0; while (true) { capture.read(frame); if (frame.empty()) { std::cout << "End of stream\n"; break; } std::vector<Detection> output; detect(frame, net, output, class_list); frame_count++; total_frames++; int detections = output.size(); for (int i = 0; i < detections; ++i) { auto detection = output[i]; auto box = detection.box; auto classId = detection.class_id; const auto color = colors[classId % colors.size()]; cv::rectangle(frame, box, color, 3); cv::rectangle(frame, cv::Point(box.x, box.y - 20), cv::Point(box.x + box.width, box.y), color, cv::FILLED); cv::putText(frame, class_list[classId].c_str(), cv::Point(box.x, box.y - 5), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0)); } if (frame_count >= 30) { auto end = std::chrono::high_resolution_clock::now(); fps = frame_count * 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count(); frame_count = 0; start = std::chrono::high_resolution_clock::now(); } if (fps > 0) { std::ostringstream fps_label; fps_label << std::fixed << std::setprecision(2); fps_label << "FPS: " << fps; std::string fps_label_str = fps_label.str(); cv::putText(frame, fps_label_str.c_str(), cv::Point(10, 25), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 0, 255), 2); } cv::imshow("output", frame); if (cv::waitKey(1) != -1) { capture.release(); std::cout << "finished by user\n"; break; } } std::cout << "Total frames: " << total_frames << "\n"; return 0; }
More details can be find in this repository: https://github.com/doleron/yolov5-opencv-cpp-python
when I run this code in my own custom onnx file I'm getting this error:
File "C:\Users\acer\.spyder-py3\metallic surface defect detection\untitled3.py", line 57, in wrap_detection
if confidence >= 0.4:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
anybody help me to fix this?
def wrap_detection(input_image, output_data): class_ids = [] confidences = [] boxes = []
rows = output_data.shape[0]
image_width, image_height, _ = input_image.shape
x_factor = image_width / INPUT_WIDTH
y_factor = image_height / INPUT_HEIGHT
for r in range(rows):
row = output_data[r]
confidence = row[4]
if confidence >= 0.4:
classes_scores = row[5:]
_, _, _, max_indx = cv2.minMaxLoc(classes_scores)
class_id = max_indx[1]
if (classes_scores[class_id] > .25):
confidences.append(confidence)
class_ids.append(class_id)
x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item()
left = int((x - 0.5 * w) * x_factor)
top = int((y - 0.5 * h) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
box = np.array([left, top, width, height])
boxes.append(box)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45)
result_class_ids = []
result_confidences = []
result_boxes = []
for i in indexes:
result_confidences.append(confidences[i])
result_class_ids.append(class_ids[i])
result_boxes.append(boxes[i])
return result_class_ids, result_confidences, result_boxes
PLS what the problem ??
@alkhalisy,
Just check the shape of the outs once.
In my case, I had to format the code the following way. Checkout Source.
def post_process(input_image, outputs):
# Lists to hold respective values while unwrapping.
class_ids = []
confidences = []
boxes = []
# Rows.
rows = outputs[0].shape[1]
image_height, image_width = input_image.shape[:2]
# Resizing factor.
x_factor = image_width / INPUT_WIDTH
y_factor = image_height / INPUT_HEIGHT
# Iterate through detections.
for r in range(rows):
row = outputs[0][0][r]
confidence = row[4]
# Discard bad detections and continue.
if confidence >= CONFIDENCE_THRESHOLD:
classes_scores = row[5:]
# Get the index of max class score.
class_id = np.argmax(classes_scores)
# Continue if the class score is above threshold.
if (classes_scores[class_id] > SCORE_THRESHOLD):
confidences.append(confidence)
class_ids.append(class_id)
cx, cy, w, h = row[0], row[1], row[2], row[3]
left = int((cx - w/2) * x_factor)
top = int((cy - h/2) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
box = np.array([left, top, width, height])
boxes.append(box)
dear Kukil thanks for your response the code from your repository ....above is the format for the function but really the same error happens "if confidence >= CONFIDENCE_THRESHOLD: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() " so PLS can help me by this
🚀 Feature
Is there any way I can use yolov5 with opencv dnn