Closed HeitorDC closed 2 years ago
@HeitorDC the camera Field Of View (FOV) characteristics have nothing to do with YOLOv5 and are dependent on the actual camera hardware used to record a given image, and the camera settings at time of capture (i.e. zoom).
Hi, mr. @glenn-jocher, thanks for your answer!!
About that, there isn't any way to get the position of the central point of the bounding box in the frame with the detect code of the YOLOv5?
@HeitorDC oh, yes, if you just want the xcenter, ycenter of a YOLOv5 detection do this. See PyTorch Hub tutorial for details. You can also use results.pandas().xywh[0]
for xcenter, ycenter, width, height format results.
This example loads a pretrained YOLOv5s model from PyTorch Hub as model
and passes an image for inference. 'yolov5s'
is the lightest and fastest YOLOv5 model. For details on all available models please see the README.
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# Image
img = 'https://ultralytics.com/images/zidane.jpg'
# Inference
results = model(img)
results.pandas().xyxy[0]
# xmin ymin xmax ymax confidence class name
# 0 749.50 43.50 1148.0 704.5 0.874023 0 person
# 1 433.50 433.50 517.5 714.5 0.687988 27 tie
# 2 114.75 195.75 1095.0 708.0 0.624512 0 person
# 3 986.00 304.00 1028.0 420.0 0.286865 27 tie
Thanks for your answer, @glenn-jocher! But can I locate that point on the frame and print that location like "top right" or "bottom left" as I move the camera or the object?
@HeitorDC I don't know what you're asking. The value you wanted is obtained as in the example above. What you do with that value is up to you.
π Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 π resources:
Access additional Ultralytics β‘ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 π and Vision AI β!
@HeitorDC HeitorDC
I saw your question and I tried to find an angle.
you cannot find angle with the center point of the detected box.
crop the box from the picture, first turn it gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
apply canny to find edges,
but be careful with the parameters otherwise we will not work properly optimize the parameters according to your own workspace.
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
angles = []
Let's add the angle of the lines in the box with the horizontal plane into an array with "HoughLinesP"
lines = cv2.HoughLinesP(edges, 1, math.pi / 180.0, 90)
for [[x1, y1, x2, y2]] in lines:
#cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)
angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
if(angle != 0):
angles.append(angle)
We determined the angle that the lines make the most as the angle made by the box and let's turn the box according to this angle.
median_angle = np.median(angles)
img = ndimage.rotate(img, median_angle)
I hope it solves your problem
when we use save-crop that files default saving in '.jpg' format. I want that crop in '.tif' files insted of jpg, what cahnges needs to do in detect.py
@Ramanmagar to save crops in '.tif' format instead of '.jpg' in detect.py, you can modify the code that handles the saving of the cropped images. In the YOLOv5 detect.py file, find the section that saves the crop and modify the file extension accordingly. Look for the section that contains the code for saving the crop using the default '.jpg' extension and change it to '.tif'.
For example, if the code looks like this:
cv2.imwrite('crop.jpg', crop)
Change it to:
cv2.imwrite('crop.tif', crop)
This will ensure that the cropped images are saved in '.tif' format.
Remember to test the changes to ensure that the modified code is functioning as expected.
Hi, guys!
I'm using YOLOv5 in my master's project and I want to know how to get the angles of the central point of the bounding box in relation to the camera and how to get the location of this in the frame, like the central point is in bottom left or top right? The "detect_principal.py" and "datasets.py" that I'm using is below.
I've already managed to get it to display the depth information of the Intel Realsense d435i on the terminal, but I need the information regarding the angles of the bounding box in the world in relation to the camera and the position in relation to the frame to also be presented.
I really need a help with this to get ahead with my project.
Thanks for your attention and help!
###############################################################################################
YOLOv5 π by Ultralytics, GPL-3.0 license
""" Run inference on images, videos, directories, streams, etc. Usage: $ python path/to/detect.py --source path/to/img.jpg --weights yolov5s.pt --img 640 """
import argparse import sys import time from pathlib import Path
import cv2 import pyrealsense2 import numpy as np import torch import torch.backends.cudnn as cudnn
FILE = Path(file).absolute() sys.path.append(FILE.parents[0].as_posix()) # add yolov5/ to path
from realsense_depth import * from models.experimental import attempt_load from utils.datasets import LoadStreams, LoadImages, LoadRealSense2 from utils.general import check_img_size, check_requirements, check_imshow, colorstr, non_max_suppression, \ apply_classifier, scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path, save_one_box from utils.plots import colors, plot_one_box from utils.torch_utils import select_device, load_classifier, time_sync
@torch.no_grad() def run(weights='yolov5s.pt', # model.pt path(s) source='data/images', # file/dir/URL/glob, 0 for webcam imgsz=640, # inference size (pixels) conf_thres=0.25, # confidence threshold iou_thres=0.45, # NMS IOU threshold max_det=1000, # maximum detections per image device='', # cuda device, i.e. 0 or 0,1,2,3 or cpu view_img=False, # show results save_txt=False, # save results to *.txt save_conf=False, # save confidences in --save-txt labels save_crop=False, # save cropped prediction boxes nosave=False, # do not save images/videos classes=None, # filter by class: --class 0, or --class 0 2 3 agnostic_nms=False, # class-agnostic NMS augment=False, # augmented inference visualize=False, # visualize features update=False, # update all models project='runs/detect', # save results to project/name name='exp', # save results to project/name exist_ok=False, # existing project/name ok, do not increment line_thickness=3, # bounding box thickness (pixels) hide_labels=False, # hide labels hide_conf=False, # hide confidences half=False, # use FP16 half-precision inference tfl_int8=False, # INT8 quantized TFLite model ): save_img = not nosave and not source.endswith('.txt') # save inference images webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith( ('rtsp://', 'rtmp://', 'http://', 'https://'))
'''
Save results (image with detections)
'''
def parse_opt(): parser = argparse.ArgumentParser() parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)') parser.add_argument('--source', type=str, default='data/images', help='file/dir/URL/glob, 0 for webcam') parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w') parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold') parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold') parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image') parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--view-img', action='store_true', help='show results') parser.add_argument('--save-txt', action='store_true', help='save results to .txt') parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels') parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes') parser.add_argument('--nosave', action='store_true', help='do not save images/videos') parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3') parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS') parser.add_argument('--augment', action='store_true', help='augmented inference') parser.add_argument('--visualize', action='store_true', help='visualize features') parser.add_argument('--update', action='store_true', help='update all models') parser.add_argument('--project', default='runs/detect', help='save results to project/name') parser.add_argument('--name', default='exp', help='save results to project/name') parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment') parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)') parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels') parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences') parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference') parser.add_argument('--tfl-int8', action='store_true', help='INT8 quantized TFLite model') opt = parser.parse_args() opt.imgsz = 2 if len(opt.imgsz) == 1 else 1 # expand return opt
def main(opt):
print(colorstr('detect: ') + ', '.join(f'{k}={v}' for k, v in vars(opt).items()))
if name == "main": opt = parse_opt() main(opt)
##############################################################################################
YOLOv5 π by Ultralytics, GPL-3.0 license
""" Dataloaders and dataset utils """
import glob import hashlib import json import logging import os import random import shutil import time from itertools import repeat from multiprocessing.pool import ThreadPool, Pool from pathlib import Path from threading import Thread
import cv2 import numpy as np import pyrealsense2 as rs import torch import torch.nn.functional as F import yaml from PIL import Image, ExifTags from torch.utils.data import Dataset from tqdm import tqdm
from utils.augmentations import Albumentations, augment_hsv, copy_paste, letterbox, mixup, random_perspective from utils.general import check_requirements, check_file, check_dataset, xywh2xyxy, xywhn2xyxy, xyxy2xywhn, \ xyn2xy, segments2boxes, clean_str, xyxy2xywh from utils.torch_utils import torch_distributed_zero_first
Parameters
HELP_URL = 'https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data' IMG_FORMATS = ['bmp', 'jpg', 'jpeg', 'png', 'tif', 'tiff', 'dng', 'webp', 'mpo'] # acceptable image suffixes VID_FORMATS = ['mov', 'avi', 'mp4', 'mpg', 'mpeg', 'm4v', 'wmv', 'mkv'] # acceptable video suffixes NUM_THREADS = min(8, os.cpu_count()) # number of multiprocessing threads
Get orientation exif tag
for orientation in ExifTags.TAGS.keys(): if ExifTags.TAGS[orientation] == 'Orientation': break
def get_hash(paths):
Returns a single hash value of a list of paths (files or dirs)
def exif_size(img):
Returns exif-corrected PIL size
def exif_transpose(image): """ Transpose a PIL image accordingly if it has an EXIF Orientation tag. From https://github.com/python-pillow/Pillow/blob/master/src/PIL/ImageOps.py
def create_dataloader(path, imgsz, batch_size, stride, single_cls=False, hyp=None, augment=False, cache=False, pad=0.0, rect=False, rank=-1, workers=8, image_weights=False, quad=False, prefix=''):
Make sure only the first process in DDP process the dataset first, and the following others can use the cache
class InfiniteDataLoader(torch.utils.data.dataloader.DataLoader): """ Dataloader that reuses workers
class _RepeatSampler(object): """ Sampler that repeats forever
class LoadImages: # for inference def init(self, path, img_size=640, stride=32, auto=True): p = str(Path(path).absolute()) # os-agnostic absolute path if '' in p: files = sorted(glob.glob(p, recursive=True)) # glob elif os.path.isdir(p): files = sorted(glob.glob(os.path.join(p, '.*'))) # dir elif os.path.isfile(p): files = [p] # files else: raise Exception(f'ERROR: {p} does not exist')
class LoadWebcam: # for inference def init(self, pipe='0', img_size=640, stride=32): self.img_size = img_size self.stride = stride self.pipe = eval(pipe) if pipe.isnumeric() else pipe self.cap = cv2.VideoCapture(self.pipe) # video capture object self.cap.set(cv2.CAP_PROP_BUFFERSIZE, 3) # set buffer size
class LoadStreams: # multiple IP or RTSP cameras def init(self, sources='streams.txt', img_size=640, stride=32, auto=True): self.mode = 'stream' self.img_size = img_size self.stride = stride
def img2label_paths(img_paths):
Define label paths as a function of image paths
Stream from RealSense D435
class LoadRealSense2: # Stream from Intel RealSense D435 def init(self, width=640, height=480, fps=30, img_size = 512):
Variabels for setup
class LoadImagesAndLabels(Dataset): # for training/testing def init(self, path, img_size=640, batch_size=16, augment=False, hyp=None, rect=False, image_weights=False, cache_images=False, single_cls=False, stride=32, pad=0.0, prefix=''): self.img_size = img_size self.augment = augment self.hyp = hyp self.image_weights = image_weights self.rect = False if image_weights else rect self.mosaic = self.augment and not self.rect # load 4 images at a time into a mosaic (only during training) self.mosaic_border = [-img_size // 2, -img_size // 2] self.stride = stride self.path = path self.albumentations = Albumentations() if augment else None
Ancillary functions --------------------------------------------------------------------------------------------------
def load_image(self, i):
loads 1 image from dataset index 'i', returns im, original hw, resized hw
def load_mosaic(self, index):
loads images in a 4-mosaic
def load_mosaic9(self, index):
loads images in a 9-mosaic
def create_folder(path='./new'):
Create folder
def flatten_recursive(path='../datasets/coco128'):
Flatten a recursive directory by bringing all files to top level
def extract_boxes(path='../datasets/coco128'): # from utils.datasets import *; extract_boxes()
Convert detection dataset into classification dataset, with one directory per class
def autosplit(path='../datasets/coco128/images', weights=(0.9, 0.1, 0.0), annotatedonly=False): """ Autosplit a dataset into train/val/test splits and save path/autosplit.txt files Usage: from utils.datasets import ; autosplit() Arguments path: Path to images directory weights: Train, val, test weights (list, tuple) annotated_only: Only use images with an annotated txt file """ path = Path(path) # images dir files = sum([list(path.rglob(f"*.{img_ext}")) for img_ext in IMG_FORMATS], []) # image files only n = len(files) # number of files random.seed(0) # for reproducibility indices = random.choices([0, 1, 2], weights=weights, k=n) # assign each image to a split
def verify_image_label(args):
Verify one image-label pair
def dataset_stats(path='coco128.yaml', autodownload=False, verbose=False, profile=False, hub=False): """ Return dataset statistics dictionary with images and instances counts per split per class To run in parent directory: export PYTHONPATH="$PWD/yolov5" Usage1: from utils.datasets import ; dataset_stats('coco128.yaml', autodownload=True) Usage2: from utils.datasets import ; dataset_stats('../datasets/coco128_with_yaml.zip') Arguments path: Path to data.yaml or data.zip (with data.yaml inside data.zip) autodownload: Attempt to download dataset if not found locally verbose: #print stats dictionary """
#################################################################################################