Closed timarnoldev closed 2 months ago
@timarnoldev Below is an inference accuracy comparison using YOLOv8s as an example. It can be observed that TensorRT-YOLO inference engine
has slight accuracy loss compared to Ultralytics inference pt
. The preprocessing methods used in tensorrt_yolo
version 3.0 and TensorRT-YOLO are letterbox
and gpuBilinearWarpAffine
respectively. Their inference results show consistent accuracy, and the implementation of letterbox
in tensorrt_yolo
version 3.0 is derived from Ultralytics' LetterBox
. Therefore, it can be concluded that the main cause of accuracy loss in inference is due to the engine
exported by trtexec
with the Efficient NMS Plugin. Reference: Efficient NMS Plugin Limitations.
As for the significant accuracy discrepancy you have shown, I find it puzzling. Please check if your processing workflow is consistent with mine.
from ultralytics import YOLO
model = YOLO("D:/Models/YOLOv8/yolov8s.pt")
model.predict("D:/Downloads/coco128/images/train2017/000000000077.jpg", save=True, imgsz=640, conf=0.25, iou=0.45, max_det=100, half=True, device="0")
from ultralytics import YOLO
model = YOLO("D:/Models/YOLOv8/yolov8s.pt")
model.predict("D:/Downloads/coco128/images/train2017/000000000077.jpg", save=True, imgsz=640, conf=0.25, iou=0.45, max_det=100, device="0")
trtyolo export -w yolov8s.pt -v yolov8 --imgsz 640 -b 1 --max_boxes 100 --iou_thres 0.45 --conf_thres 0.25 -o ./ -s
trtexec --onnx=yolov8s.onnx --saveEngine=yolov8s-fp16.engine --fp16
xmake run -P . detect -e D:/Models/YOLOv8/yolov8s-fp16.engine -i D:/Downloads/coco128/images/train2017/000000000077.jpg -o ./ -l labels.txt
trtyolo export -w yolov8s.pt -v yolov8 --imgsz 640 -b 1 --max_boxes 100 --iou_thres 0.45 --conf_thres 0.25 -o ./ -s
trtexec --onnx=yolov8s.onnx --saveEngine=yolov8s-fp32.engine
xmake run -P . detect -e D:/Models/YOLOv8/yolov8s-fp32.engine -i D:/Downloads/coco128/images/train2017/000000000077.jpg -o ./ -l labels.txt
pip install tensorrt_yolo
trtyolo export -w yolov8s.pt -v yolov8 --imgsz 640 -b 1 --max_boxes 100 --iou_thres 0.45 --conf_thres 0.25 -o ./ -s
trtexec --onnx=yolov8s.onnx --saveEngine=yolov8s-fp16.engine --fp16
trtyolo infer -e D:/Models/YOLOv8/yolov8s-fp16.engine -i D:/Downloads/coco128/images/train2017/000000000077.jpg -o ./ -l labels.txt
pip install tensorrt_yolo
trtyolo export -w yolov8s.pt -v yolov8 --imgsz 640 -b 1 --max_boxes 100 --iou_thres 0.45 --conf_thres 0.25 -o ./ -s
trtexec --onnx=yolov8s.onnx --saveEngine=yolov8s-fp32.engine
trtyolo infer -e D:/Models/YOLOv8/yolov8s-fp32.engine -i D:/Downloads/coco128/images/train2017/000000000077.jpg -o ./ -l labels.txt
Thanks for your detailed reply @laugh12321 I use the engine file for the python version as well as the c++ version but with your EfficientNMS plugin as discussed in #38 Is it mandatory to use this plugin? Nvidia seems to have deprecated it: https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/efficientNMSPlugin/README.md
Maybe the problem is in the way preprocessing is handled. I used Roboflow for dataset preparation which compresses the images down to 640x640. How is image scaling managed in TonsorRT-YOLO?
@timarnoldev For the TensorRT-YOLO project, the EfficientNMS plugin is mandatory. It replaces the use of CUDA Kernels for post-processing, thereby improving inference speed. In future versions, we will consider replacing the EfficientNMS plugin with the INMSLayer.
Regarding the preprocessing operations mentioned for Roboflow, I am not very familiar with them. I use the same preprocessing method as Ultralytics, which involves scaling the images while maintaining the aspect ratio. You can refer to this article for more information: https://medium.com/@mattia.digiusto/optimising-image-pre-processing-in-python-ac9157951bf6
Regarding the preprocessing operations mentioned for Roboflow, I am not very familiar with them. I use the same preprocessing method as Ultralytics, which involves scaling the images while maintaining the aspect ratio. You can refer to this article for more information: https://medium.com/@mattia.digiusto/optimising-image-pre-processing-in-python-ac9157951bf6
I just checked, I'm also using letterbox for the python program. Btw this is how i converted .pt > .onnx back then: https://github.com/triple-Mu/YOLOv8-TensorRT?tab=readme-ov-file#export-end2end-onnx-with-nms
@timarnoldev The conversion methods for TensorRT-YOLO and YOLOv8-TensorRT models are the same, and the inference results are identical.
Convert with YOLOv8-TensorRT, Inference with TensorRT-YOLO FP16
git clone https://github.com/triple-Mu/YOLOv8-TensorRT.git
cd YOLOv8-TensorRT
python export-det.py --weights D:\Models\YOLOv8\yolov8s.pt --iou-thres 0.45 --conf-thres 0.25 --topk 100 --opset 11 --sim --input-shape 1 3 640 640 --device cuda:0
cd D:\Models\YOLOv8
trtexec --onnx=yolov8s.onnx --saveEngine=yolov8s-fp16.engine --fp16
xmake run -P . detect -e D:/Models/YOLOv8/yolov8s-fp16.engine -i D:/Downloads/coco128/images/train2017/000000000077.jpg -o ./ -l labels.txt
In the image below, yolov8s.onnx
was exported using YOLOv8-TensorRT, while yolov8s-old.onnx
was exported using TensorRT-YOLO. The two models are identical except for the output node names.
Good to know, thank you. Can the tensorrt version be the issue? With python I used 8.6.1 and now 10.2.0 I'm so confused right now because in both ways I used the exact same .pt file.
@timarnoldev It shouldn't be an issue with the TensorRT version. The precision should be the same between 8.6.1 and 10.2.0. To better diagnose the problem, could you please send me your .pt model and test data in a ZIP archive? This way, I can help you analyze it in more detail.
I also just found out, triple-Mu/YOLOv8-TensorRT also has a c++ inference example. This works just fine with the expected results, but only supports yolov8.
@timarnoldev I used the provided model and data to perform inference with TensorRT-YOLO, and the accuracy of the results is normal. The vis.zip file contains the visualized results of the inference.
That is strange. This is the code I used for inference. Is maybe here the problem?
#include <QCoreApplication>
#include <iostream>
#include <opencv2/opencv.hpp>
#include "AIWorker.h"
#include "tensorrt/deploy/vision/detection.hpp"
AIWorker::AIWorker(QObject *parent)
: QObject(parent) {
m_running = false;
}
std::vector<std::pair<std::string, cv::Scalar>> generateLabelColorPairs() {
std::vector<std::pair<std::string, cv::Scalar>> labelColorPairs;
auto generateRandomColor = []() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int> dis(0, 255);
return cv::Scalar(dis(gen), dis(gen), dis(gen));
};
labelColorPairs.emplace_back("ball", generateRandomColor());
labelColorPairs.emplace_back("player_red", generateRandomColor());
return labelColorPairs;
}
// Visualize detection results
void visualize(cv::Mat& image, const deploy::DetectionResult& result, const std::vector<std::pair<std::string, cv::Scalar>>& labelColorPairs) {
for (size_t i = 0; i < result.num; ++i) {
const auto& box = result.boxes[i];
int cls = result.classes[i];
float score = result.scores[i];
const auto& label = labelColorPairs[cls].first;
const auto& color = labelColorPairs[cls].second;
std::string labelText = label + " " + cv::format("%.2f", score);
// Draw rectangle and label
int baseLine;
cv::Size labelSize = cv::getTextSize(labelText, cv::FONT_HERSHEY_SIMPLEX, 0.6, 1, &baseLine);
cv::rectangle(image, cv::Point(box.left, box.top), cv::Point(box.right, box.bottom), color, 2, cv::LINE_AA);
cv::rectangle(image, cv::Point(box.left, box.top - labelSize.height), cv::Point(box.left + labelSize.width, box.top), color, -1);
cv::putText(image, labelText, cv::Point(box.left, box.top), cv::FONT_HERSHEY_SIMPLEX, 0.6, cv::Scalar(255, 255, 255), 1);
}
}
void AIWorker::Start() {
m_running = true;
std::shared_ptr<deploy::BaseDet> model = std::make_shared<deploy::DeployDet>("../ai.engine");
std::vector<std::pair<std::string, cv::Scalar>> labels = generateLabelColorPairs();
while (m_running) {
QCoreApplication::processEvents( QEventLoop::WaitForMoreEvents,1);
deploy::Image image(currentImage.data, currentImage.cols, currentImage.rows);
auto result = model->predict(image);
visualize(currentImage, result, labels);
emit imageAnalyzed(currentImage, currentImageID);
//cv::imshow("AI", currentImage);
nextid++;
}
}
int AIWorker::getCurrentImageId() {
return nextid;
}
void AIWorker::onImageReceivedAr(cv::Mat image, int id) {
this->currentImage = image.clone();
this->currentImageID = id;
this->imagePresent = true;
}
void AIWorker::stop() {
m_running = false;
}
@timarnoldev You might want to first try using TensorRT-YOLO's demo/detect to verify if the accuracy is correct. All the C++ inference results we discussed earlier were obtained using this demo/detect
.
Thanks for your detailed reply @laugh12321 I use the engine file for the python version as well as the c++ version but with your EfficientNMS plugin as discussed in #38 Is it mandatory to use this plugin? Nvidia seems to have deprecated it: https://github.com/NVIDIA/TensorRT/blob/release/10.1/plugin/efficientNMSPlugin/README.md
Maybe the problem is in the way preprocessing is handled. I used Roboflow for dataset preparation which compresses the images down to 640x640. How is image scaling managed in TonsorRT-YOLO?
@timarnoldev To clarify, NVIDIA/TensorRT has deprecated the EfficientNMSONNXPlugin
plugin, not the EfficientNMS_TRT
plugin. In fact, the efficientNMSPlugin
defines two plugins: EfficientNMS_TRT
and EfficientNMSONNXPlugin
.
Based on testing, the inference accuracy is consistent whether using EfficientNMS_TRT
, EfficientNMSONNXPlugin
, or INMSLayer
.
I just found the problem. I preprocessed the images on myself which didn't line up with the training. Thank you very much for your help.
Running the exact same model (tested with v8, v9 and v10) with this library results in very bad detection results. I converted them from .pt -> .onnx using
trtyolo
and the custom yolov10 repo respectively.Some object classes aren't detected that all. Sometimes there are rare detections of random objects with very low confidences. On the other hand running the model using tensorrt in python delivers perfect results.
TensorRT-YOLO
Python Tensorrt Inference
Any ideas what the problem might be?
Export command I used:
trtyolo export -w best.pt -v yolov8 -o output --max_boxes 100 --iou_thres 0.45 --conf_thres 0.15 -b -1