Open Killuagg opened 1 week ago
π Hello @Killuagg, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training β Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 π!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
@Killuagg hi there,
Thank you for reaching out and for providing details about your setup and issue. To help you increase the FPS for your camera capture on the Raspberry Pi 4B, here are a few suggestions:
Verify Latest Versions: Ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
Optimize Model Inference:
sudo apt-get install -y libopenblas-base libopenmpi-dev
wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt -O yolov5s.pt
python3 export.py --weights yolov5s.pt --img 640 --batch 1 --device 0 --include engine
This will generate a TensorRT engine file which you can use for inference.
Reduce Image Size: Lowering the image size can help increase FPS. You can try reducing the --img
parameter to 320 or even lower, depending on your accuracy requirements:
python detect.py --weights best.onnx --img 320 --conf 0.7 --source 0
Use a More Efficient Model: If you are using yolov5s
, you might want to try yolov5n
(nano), which is designed to be more lightweight and faster, though with a potential trade-off in accuracy:
python detect.py --weights yolov5n.onnx --img 640 --conf 0.7 --source 0
Optimize Code: Ensure that your code is optimized for performance. For example, make sure that the webcam capture and model inference are not blocking each other. You can use threading to handle webcam capture and inference in parallel.
Hardware Acceleration: Ensure that you are utilizing hardware acceleration available on the Raspberry Pi. This includes enabling OpenCV with hardware acceleration and using appropriate libraries that leverage the GPU.
If you continue to experience issues, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
Thank for your replied. First when i try to run the detect.py with img 320 the error produce : expected 620 not 320 size. So i only can run the 640 inside my raspberry pi. If i want to run the TensorRT model inside my raspberry pi, do i need to run it on GPU raspberry pi because device available is CPU only. Is there any code inside detect.py that make my fps have limit?
Hi @Killuagg,
Thank you for your follow-up and for providing additional details. Let's address your concerns one by one.
The error you encountered (expected 620 not 320 size
) suggests that the model expects a specific input size. To resolve this, you can modify the model's input size to match your desired dimensions. However, if you're constrained to using 640 due to model requirements, let's focus on optimizing other aspects.
Running TensorRT on a Raspberry Pi can indeed provide significant performance improvements, but it typically requires a GPU. Since the Raspberry Pi 4B primarily relies on its CPU, you might not see the same benefits as on a GPU-enabled device. However, you can still try optimizing your setup:
Install TensorRT: You can install TensorRT on your Raspberry Pi, but note that the performance gains might be limited due to the lack of a dedicated GPU.
Optimize Inference Code: Ensure that your inference code is as efficient as possible. For example, you can use threading to handle webcam capture and model inference in parallel, reducing any potential bottlenecks.
Here's an example of how you might use threading to improve performance:
import cv2
import threading
import time
from yolov5 import YOLOv5
# Load model
model = YOLOv5("best.onnx")
# Function to capture frames
def capture_frames():
global frame
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
time.sleep(0.01) # Adjust sleep time as needed
# Function to run inference
def run_inference():
global frame
while True:
if frame is not None:
results = model.predict(frame)
# Process results
time.sleep(0.01) # Adjust sleep time as needed
# Start threads
frame = None
thread1 = threading.Thread(target=capture_frames)
thread2 = threading.Thread(target=run_inference)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
Please ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
If you continue to experience issues, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
Thank you for sharing info. May i know another method without using the TensorRT lite. I mean, its possible the solution only involving the CPU not GPU. Sorry for asking. Plus, may i know if 2000 images for train will effect the FPS?. Because i have other model with 800 images and the FPS still the same.
Why after i run the detect.py using source 0 which is webcam, the file mp4 cannot play on my raspberry pi and also window 11?
Hi @Killuagg,
Thank you for your detailed follow-up! Let's address your questions and concerns step by step.
If you're looking to optimize your YOLOv5 model inference on a CPU-only setup, here are a few strategies you can employ:
Model Quantization: Quantizing your model can significantly improve inference speed by reducing the precision of the weights and activations. You can use tools like PyTorch's built-in quantization:
import torch
from torch.quantization import quantize_dynamic
model = torch.load('best.pt')
quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
torch.save(quantized_model, 'best_quantized.pt')
Use a Smaller Model: If you're currently using yolov5s
, consider switching to yolov5n
(nano), which is designed to be more lightweight and faster:
python detect.py --weights yolov5n.pt --img 640 --conf 0.7 --source 0
Optimize Code Execution: Ensure that your code is optimized for performance. For example, using threading to handle webcam capture and model inference in parallel can help reduce bottlenecks.
The number of images used for training (2000 vs. 800) does not directly affect the FPS during inference. The FPS is influenced by the model size, input image size, and the computational power of your device. However, a larger dataset can improve the model's accuracy, which might indirectly affect the processing time if the model becomes more complex.
Regarding the issue with the MP4 file not playing on your Raspberry Pi and Windows 11, it could be related to the codec or the way the video is being saved. Ensure that the video is saved using a widely supported codec like H.264. Hereβs an example of how to save the video correctly:
import cv2
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Use 'XVID' for .avi files
out = cv2.VideoWriter('output.mp4', fourcc, 20.0, (640, 480))
while cap.isOpened():
ret, frame = cap.read()
if ret:
# Write the frame
out.write(frame)
else:
break
# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()
To help us better understand and resolve your issue, could you please provide a minimal reproducible example of your code? This will allow us to reproduce the bug and investigate a solution. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.
Lastly, please ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
""" Run YOLOv5 detection inference on images, videos, directories, globs, YouTube, webcam, streams, etc.
Usage - sources: $ python detect.py --weights yolov5s.pt --source 0 # webcam img.jpg # image vid.mp4 # video screen # screenshot path/ # directory list.txt # list of images list.streams # list of streams 'path/*.jpg' # glob 'https://youtu.be/LNwODJXcvt4' # YouTube 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
Usage - formats: $ python detect.py --weights yolov5s.pt # PyTorch yolov5s.torchscript # TorchScript yolov5s.onnx # ONNX Runtime or OpenCV DNN with --dnn yolov5s_openvino_model # OpenVINO yolov5s.engine # TensorRT yolov5s.mlmodel # CoreML (macOS-only) yolov5s_saved_model # TensorFlow SavedModel yolov5s.pb # TensorFlow GraphDef yolov5s.tflite # TensorFlow Lite yolov5s_edgetpu.tflite # TensorFlow Edge TPU yolov5s_paddle_model # PaddlePaddle """
import argparse import csv import os import platform import sys from pathlib import Path
import torch import time
import pyttsx3
engine = pyttsx3.init()
FILE = Path(file).resolve() ROOT = FILE.parents[0] # YOLOv5 root directory if str(ROOT) not in sys.path: sys.path.append(str(ROOT)) # add ROOT to PATH ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative
from ultralytics.utils.plotting import Annotator, colors, save_one_box
from models.common import DetectMultiBackend from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams from utils.general import ( LOGGER, Profile, check_file, check_img_size, check_imshow, check_requirements, colorstr, cv2, increment_path, non_max_suppression, print_args, scale_boxes, strip_optimizer, xyxy2xywh, ) from utils.torch_utils import select_device, smart_inference_mode
@smart_inference_mode() def run( weights=ROOT / "best.onnx", # model path or triton URL source=ROOT / "Data/images", # file/dir/URL/glob/screen/0(webcam) data=ROOT / "data.yaml", # dataset.yaml path imgsz=(640, 640), # inference size (height, width) conf_thres=0.25, # confidence threshold iou_thres=0.45, # NMS IOU threshold max_det=1000, # maximum detections per image device="", # cuda device, i.e. 0 or 0,1,2,3 or cpu view_img=False, # show results save_txt=False, # save results to *.txt save_csv=False, # save results in CSV format save_conf=False, # save confidences in --save-txt labels save_crop=False, # save cropped prediction boxes nosave=False, # do not save images/videos classes=None, # filter by class: --class 0, or --class 0 2 3 agnostic_nms=False, # class-agnostic NMS augment=False, # augmented inference visualize=False, # visualize features update=False, # update all models project=ROOT / "runs/detect", # save results to project/name name="exp", # save results to project/name exist_ok=False, # existing project/name ok, do not increment line_thickness=3, # bounding box thickness (pixels) hide_labels=False, # hide labels hide_conf=False, # hide confidences half=False, # use FP16 half-precision inference dnn=False, # use OpenCV DNN for ONNX inference vid_stride=1, # video frame-rate stride ): source = str(source) save_img = not nosave and not source.endswith(".txt") # save inference images is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS) is_url = source.lower().startswith(("rtsp://", "rtmp://", "http://", "https://")) webcam = source.isnumeric() or source.endswith(".streams") or (is_url and not is_file) screenshot = source.lower().startswith("screen") if is_url and is_file: source = check_file(source) # download
# Directories
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok) # increment run
(save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
# Load model
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
imgsz = check_img_size(imgsz, s=stride) # check image size
# Dataloader
bs = 1 # batch_size
if webcam:
view_img = check_imshow(warn=True)
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
bs = len(dataset)
elif screenshot:
dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
else:
dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
vid_path, vid_writer = [None] * bs, [None] * bs
# FPS calculation
prev_time = time.time()
# Run inference
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz)) # warmup
seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))
for path, im, im0s, vid_cap, s in dataset:
current_time = time.time()
fps = 1 / (current_time - prev_time)
prev_time = current_time
with dt[0]:
im = torch.from_numpy(im).to(model.device)
im = im.half() if model.fp16 else im.float() # uint8 to fp16/32
im /= 255 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
if model.xml and im.shape[0] > 1:
ims = torch.chunk(im, im.shape[0], 0)
# Inference
with dt[1]:
visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
if model.xml and im.shape[0] > 1:
pred = None
for image in ims:
if pred is None:
pred = model(image, augment=augment, visualize=visualize).unsqueeze(0)
else:
pred = torch.cat((pred, model(image, augment=augment, visualize=visualize).unsqueeze(0)), dim=0)
pred = [pred, None]
else:
pred = model(im, augment=augment, visualize=visualize)
# NMS
with dt[2]:
pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
# Second-stage classifier (optional)
# pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)
# Define the path for the CSV file
csv_path = save_dir / "predictions.csv"
# Create or append to the CSV file
def write_to_csv(image_name, prediction, confidence):
"""Writes prediction data for an image to a CSV file, appending if the file exists."""
data = {"Image Name": image_name, "Prediction": prediction, "Confidence": confidence}
with open(csv_path, mode="a", newline="") as f:
writer = csv.DictWriter(f, fieldnames=data.keys())
if not csv_path.is_file():
writer.writeheader()
writer.writerow(data)
# Process predictions
for i, det in enumerate(pred): # per image
seen += 1
if webcam: # batch_size >= 1
p, im0, frame = path[i], im0s[i].copy(), dataset.count
s += f"{i}: "
else:
p, im0, frame = path, im0s.copy(), getattr(dataset, "frame", 0)
p = Path(p) # to Path
save_path = str(save_dir / p.name) # im.jpg
txt_path = str(save_dir / "labels" / p.stem) + ("" if dataset.mode == "image" else f"_{frame}") # im.txt
s += "%gx%g " % im.shape[2:] # print string
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
imc = im0.copy() if save_crop else im0 # for save_crop
annotator = Annotator(im0, line_width=line_thickness, example=str(names))
if len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
# Print results
for c in det[:, 5].unique():
n = (det[:, 5] == c).sum() # detections per class
s += f"{n} {names[int(c)]}{'s' * (n > 1)}, " # add to string
# Write results
for *xyxy, conf, cls in reversed(det):
c = int(cls) # integer class
label = names[c] if hide_conf else f"{names[c]}"
confidence = float(conf)
confidence_str = f"{confidence:.2f}"
if save_csv:
write_to_csv(p.name, label, confidence_str)
if save_txt: # Write to file
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
line = (cls, *xywh, conf) if save_conf else (cls, *xywh) # label format
with open(f"{txt_path}.txt", "a") as f:
f.write(("%g " * len(line)).rstrip() % line + "\n")
if save_img or save_crop or view_img: # Add bbox to image
c = int(cls) # integer class
label = None if hide_labels else (names[c] if hide_conf else f"{names[c]} {conf:.2f}")
annotator.box_label(xyxy, label, color=colors(c, True))
if save_crop:
save_one_box(xyxy, imc, file=save_dir / "crops" / names[c] / f"{p.stem}.jpg", BGR=True)
# Overlay FPS on the frame
cv2.putText(im0, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
# Stream results
im0 = annotator.result()
if view_img:
if platform.system() == "Linux" and p not in windows:
windows.append(p)
cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO) # allow window resize (Linux)
cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
cv2.imshow(str(p), im0)
cv2.waitKey(1) # 1 millisecond
# Save results (image with detections)
if save_img:
if dataset.mode == "image":
cv2.imwrite(save_path, im0)
else: # 'video' or 'stream'
if vid_path[i] != save_path: # new video
vid_path[i] = save_path
if isinstance(vid_writer[i], cv2.VideoWriter):
vid_writer[i].release() # release previous video writer
if vid_cap: # video
fps = vid_cap.get(cv2.CAP_PROP_FPS)
w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
else: # stream
fps, w, h = 30, im0.shape[1], im0.shape[0]
save_path = str(Path(save_path).with_suffix(".mp4")) # force *.mp4 suffix on results videos
vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
vid_writer[i].write(im0)
# Print time (inference-only)
LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")
detections = []
for *xyxy, conf, cls in reversed(det):
detections.append({'label': names[int(cls)]})
# Assuming 'detections' is your list of detected objects
for det in detections:
# Extract the label of the detected object
label = det['label']
print(f"Detected: {label}") # Debugging print statement
# Generate voice feedback
engine.say(f"Detected {label}")
engine.runAndWait()
# Print results
t = tuple(x.t / seen * 1e3 for x in dt) # speeds per image
LOGGER.info(f"Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}" % t)
if save_txt or save_img:
s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ""
LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
if update:
strip_optimizer(weights[0]) # update model (to fix SourceChangeWarning)
def parse_opt(): """Parses command-line arguments for YOLOv5 detection, setting inference options and model configurations.""" parser = argparse.ArgumentParser() parser.add_argument("--weights", nargs="+", type=str, default=ROOT / "yolov5s.pt", help="model path or triton URL") parser.add_argument("--source", type=str, default=ROOT / "data/images", help="file/dir/URL/glob/screen/0(webcam)") parser.add_argument("--data", type=str, default=ROOT / "data/coco128.yaml", help="(optional) dataset.yaml path") parser.add_argument("--imgsz", "--img", "--img-size", nargs="+", type=int, default=[640], help="inference size h,w") parser.add_argument("--conf-thres", type=float, default=0.25, help="confidence threshold") parser.add_argument("--iou-thres", type=float, default=0.45, help="NMS IoU threshold") parser.add_argument("--max-det", type=int, default=1000, help="maximum detections per image") parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu") parser.add_argument("--view-img", action="store_true", help="show results") parser.add_argument("--save-txt", action="store_true", help="save results to .txt") parser.add_argument("--save-csv", action="store_true", help="save results in CSV format") parser.add_argument("--save-conf", action="store_true", help="save confidences in --save-txt labels") parser.add_argument("--save-crop", action="store_true", help="save cropped prediction boxes") parser.add_argument("--nosave", action="store_true", help="do not save images/videos") parser.add_argument("--classes", nargs="+", type=int, help="filter by class: --classes 0, or --classes 0 2 3") parser.add_argument("--agnostic-nms", action="store_true", help="class-agnostic NMS") parser.add_argument("--augment", action="store_true", help="augmented inference") parser.add_argument("--visualize", action="store_true", help="visualize features") parser.add_argument("--update", action="store_true", help="update all models") parser.add_argument("--project", default=ROOT / "runs/detect", help="save results to project/name") parser.add_argument("--name", default="exp", help="save results to project/name") parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment") parser.add_argument("--line-thickness", default=3, type=int, help="bounding box thickness (pixels)") parser.add_argument("--hide-labels", default=False, action="store_true", help="hide labels") parser.add_argument("--hide-conf", default=False, action="store_true", help="hide confidences") parser.add_argument("--half", action="store_true", help="use FP16 half-precision inference") parser.add_argument("--dnn", action="store_true", help="use OpenCV DNN for ONNX inference") parser.add_argument("--vid-stride", type=int, default=1, help="video frame-rate stride") opt = parser.parse_args() opt.imgsz = 2 if len(opt.imgsz) == 1 else 1 # expand print_args(vars(opt)) return opt
def main(opt): """Executes YOLOv5 model inference with given options, checking requirements before running the model.""" check_requirements(ROOT / "requirements.txt", exclude=("tensorboard", "thop")) run(**vars(opt))
if name == "main": opt = parse_opt() main(opt)
I am using my modified detect1.py file from YOLOv5 Pytorch. I already follow the code you show but it still cannot show the video. Can you help me modified the code i share.
Hi @Killuagg,
Thank you for sharing your detailed code and setup. Let's address your concerns step by step to ensure we can help you effectively.
The issue with the video not playing could be related to how the video is being saved or displayed. Let's ensure that the video is saved correctly and that the display logic is handled properly.
First, let's ensure that the video is saved using a widely supported codec like H.264. Here's a snippet to ensure the video is saved correctly:
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Use 'XVID' for .avi files
out = cv2.VideoWriter('output.mp4', fourcc, 20.0, (640, 480))
while cap.isOpened():
ret, frame = cap.read()
if ret:
# Write the frame
out.write(frame)
else:
break
# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()
Next, let's ensure that the video display logic is handled correctly. Hereβs a simplified version of your detect.py
script focusing on video display:
import cv2
import time
import torch
from pathlib import Path
from models.common import DetectMultiBackend
from utils.dataloaders import LoadStreams
from utils.general import check_img_size, non_max_suppression, scale_boxes, xyxy2xywh
from utils.plots import Annotator, colors
# Load model
device = torch.device('cpu') # Change to 'cuda' if you have a GPU
model = DetectMultiBackend('best.onnx', device=device)
stride, names = model.stride, model.names
imgsz = check_img_size((640, 640), s=stride) # check image size
# Dataloader
source = '0' # webcam
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=True)
# Run inference
model.warmup(imgsz=(1, 3, *imgsz)) # warmup
for path, im, im0s, vid_cap, s in dataset:
im = torch.from_numpy(im).to(device)
im = im.float() / 255.0 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
# Inference
pred = model(im)
# NMS
pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)
# Process predictions
for i, det in enumerate(pred): # per image
im0 = im0s[i].copy()
annotator = Annotator(im0, line_width=3, example=str(names))
if len(det):
det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
for *xyxy, conf, cls in reversed(det):
label = f'{names[int(cls)]} {conf:.2f}'
annotator.box_label(xyxy, label, color=colors(int(cls), True))
# Display results
cv2.imshow(str(path), im0)
if cv2.waitKey(1) == ord('q'): # 1 millisecond
break
cv2.destroyAllWindows()
Please ensure you are using the latest versions of torch
and the YOLOv5 repository. This can sometimes resolve performance issues due to optimizations and bug fixes in newer releases.
If the issue persists, please provide a minimal reproducible example of your code. This will help us investigate further. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
I am sorry.I am confuse where i need to place the code inside the detect.py
Hi @Killuagg,
Thank you for your patience and for providing more details about your setup. Let's clarify where to place the code within your detect.py
script to ensure everything runs smoothly.
detect.py
Here's a structured example to guide you:
import argparse
import os
import sys
from pathlib import Path
import torch
import time
import cv2
from models.common import DetectMultiBackend
from utils.dataloaders import LoadStreams
from utils.general import check_img_size, non_max_suppression, scale_boxes, xyxy2xywh
from utils.plots import Annotator, colors
# Initialize the TTS engine
import pyttsx3
engine = pyttsx3.init()
# Define the main function
def run(weights='best.onnx', source='0', imgsz=(640, 640), conf_thres=0.25, iou_thres=0.45, max_det=1000, device='cpu', view_img=False):
# Load model
device = torch.device(device)
model = DetectMultiBackend(weights, device=device)
stride, names = model.stride, model.names
imgsz = check_img_size(imgsz, s=stride) # check image size
# Dataloader
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=True)
# Run inference
model.warmup(imgsz=(1, 3, *imgsz)) # warmup
for path, im, im0s, vid_cap, s in dataset:
im = torch.from_numpy(im).to(device)
im = im.float() / 255.0 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
# Inference
pred = model(im)
# NMS
pred = non_max_suppression(pred, conf_thres, iou_thres, None, False, max_det=max_det)
# Process predictions
for i, det in enumerate(pred): # per image
im0 = im0s[i].copy()
annotator = Annotator(im0, line_width=3, example=str(names))
if len(det):
det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()
for *xyxy, conf, cls in reversed(det):
label = f'{names[int(cls)]} {conf:.2f}'
annotator.box_label(xyxy, label, color=colors(int(cls), True))
# Display results
if view_img:
cv2.imshow(str(path), im0)
if cv2.waitKey(1) == ord('q'): # 1 millisecond
break
# Generate voice feedback
detections = [{'label': names[int(cls)]} for *xyxy, conf, cls in reversed(det)]
for det in detections:
label = det['label']
engine.say(f"Detected {label}")
engine.runAndWait()
cv2.destroyAllWindows()
# Define the argument parser
def parse_opt():
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='best.onnx', help='model path')
parser.add_argument('--source', type=str, default='0', help='source')
parser.add_argument('--imgsz', type=int, nargs='+', default=[640, 640], help='inference size h,w')
parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
parser.add_argument('--device', default='cpu', help='cuda device or cpu')
parser.add_argument('--view-img', action='store_true', help='show results')
return parser.parse_args()
# Main entry point
if __name__ == "__main__":
opt = parse_opt()
run(**vars(opt))
torch
and the YOLOv5 repository.Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
I have evaluate my model with val.py. The dataset was image extracted from video. When test with test dataset from google, it have high metrics.If i am using the dataset test from extracted video raspberry pi. i only get 60% metrics.How can i improve it?
Hi @Killuagg,
Thank you for reaching out and sharing your evaluation results. It's great to hear that your model performs well on the test dataset from Google but not as well on the dataset extracted from video on the Raspberry Pi. Let's explore some potential reasons and solutions to improve your metrics:
Dataset Quality and Diversity:
Data Augmentation:
Model Fine-Tuning:
Hyperparameter Tuning:
Test-Time Augmentation (TTA):
--augment
flag to your val.py
command:
python val.py --weights yolov5x.pt --data coco.yaml --img 832 --augment --half
Evaluate on Latest Versions:
torch
and the YOLOv5 repository. Updates often include performance improvements and bug fixes that could benefit your model's performance.If you could provide a minimal reproducible example of your code, it would help us investigate further. You can find more details on creating a minimal reproducible example here. This step is crucial for us to provide accurate and effective support.
Feel free to reach out if you have any more questions or need further assistance. The YOLO community and the Ultralytics team are always here to help! π
Search before asking
YOLOv5 Component
Detection
Bug
Hi, i am currently trying to make traffic sign detection and recognition by using the YOLOv5 Pytorch with Yolov5s model. I am using detect.py file to run the model and the FPS i get is only 1 FPS. The dataset contain around 2K images with 200 epochs. I run the code with: python detect.py --weights best.onnx --img 640 --conf 0.7 --source 0
Is there any modify to the code so that i can get more than 4FPS?
Environment
-Raspberry Pi 4B with 8GB Ram -Webcam -Model best.onnx -Train using Yolov5 Pytorch
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?