ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.76k stars 16.35k forks source link

how to get the angles of a central point of the bounding box in relation to the camera? #5302

Closed HeitorDC closed 2 years ago

HeitorDC commented 3 years ago

Hi, guys!

I'm using YOLOv5 in my master's project and I want to know how to get the angles of the central point of the bounding box in relation to the camera and how to get the location of this in the frame, like the central point is in bottom left or top right? The "detect_principal.py" and "datasets.py" that I'm using is below.

I've already managed to get it to display the depth information of the Intel Realsense d435i on the terminal, but I need the information regarding the angles of the bounding box in the world in relation to the camera and the position in relation to the frame to also be presented.

I really need a help with this to get ahead with my project.

Thanks for your attention and help!

###############################################################################################

YOLOv5 πŸš€ by Ultralytics, GPL-3.0 license

""" Run inference on images, videos, directories, streams, etc. Usage: $ python path/to/detect.py --source path/to/img.jpg --weights yolov5s.pt --img 640 """

import argparse import sys import time from pathlib import Path

import cv2 import pyrealsense2 import numpy as np import torch import torch.backends.cudnn as cudnn

FILE = Path(file).absolute() sys.path.append(FILE.parents[0].as_posix()) # add yolov5/ to path

from realsense_depth import * from models.experimental import attempt_load from utils.datasets import LoadStreams, LoadImages, LoadRealSense2 from utils.general import check_img_size, check_requirements, check_imshow, colorstr, non_max_suppression, \ apply_classifier, scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path, save_one_box from utils.plots import colors, plot_one_box from utils.torch_utils import select_device, load_classifier, time_sync

@torch.no_grad() def run(weights='yolov5s.pt', # model.pt path(s) source='data/images', # file/dir/URL/glob, 0 for webcam imgsz=640, # inference size (pixels) conf_thres=0.25, # confidence threshold iou_thres=0.45, # NMS IOU threshold max_det=1000, # maximum detections per image device='', # cuda device, i.e. 0 or 0,1,2,3 or cpu view_img=False, # show results save_txt=False, # save results to *.txt save_conf=False, # save confidences in --save-txt labels save_crop=False, # save cropped prediction boxes nosave=False, # do not save images/videos classes=None, # filter by class: --class 0, or --class 0 2 3 agnostic_nms=False, # class-agnostic NMS augment=False, # augmented inference visualize=False, # visualize features update=False, # update all models project='runs/detect', # save results to project/name name='exp', # save results to project/name exist_ok=False, # existing project/name ok, do not increment line_thickness=3, # bounding box thickness (pixels) hide_labels=False, # hide labels hide_conf=False, # hide confidences half=False, # use FP16 half-precision inference tfl_int8=False, # INT8 quantized TFLite model ): save_img = not nosave and not source.endswith('.txt') # save inference images webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith( ('rtsp://', 'rtmp://', 'http://', 'https://'))

# Directories
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

# Initialize
set_logging()
device = select_device(device)
half &= device.type != 'cpu'  # half precision only supported on CUDA

# Load model
w = weights[0] if isinstance(weights, list) else weights
classify, suffix = False, Path(w).suffix.lower()
pt, onnx, tflite, pb, saved_model = (suffix == x for x in ['.pt', '.onnx', '.tflite', '.pb', ''])  # backend
stride, names = 64, [f'class{i}' for i in range(1000)]  # assign defaults
if pt:
    model = attempt_load(weights, map_location=device)  # load FP32 model
    stride = int(model.stride.max())  # model stride
    names = model.module.names if hasattr(model, 'module') else model.names  # get class names
    if half:
        model.half()  # to FP16
    if classify:  # second-stage classifier
        modelc = load_classifier(name='resnet50', n=2)  # initialize
        modelc.load_state_dict(torch.load('resnet50.pt', map_location=device)['model']).to(device).eval()
elif onnx:
    check_requirements(('onnx', 'onnxruntime'))
    import onnxruntime
    session = onnxruntime.InferenceSession(w, None)
else:  # TensorFlow models
    check_requirements(('tensorflow>=2.4.1',))
    import tensorflow as tf
    if pb:  # https://www.tensorflow.org/guide/migrate#a_graphpb_or_graphpbtxt
        def wrap_frozen_graph(gd, inputs, outputs):
            x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=""), [])  # wrapped import
            return x.prune(tf.nest.map_structure(x.graph.as_graph_element, inputs),
                           tf.nest.map_structure(x.graph.as_graph_element, outputs))

        graph_def = tf.Graph().as_graph_def()
        graph_def.ParseFromString(open(w, 'rb').read())
        frozen_func = wrap_frozen_graph(gd=graph_def, inputs="x:0", outputs="Identity:0")
    elif saved_model:
        model = tf.keras.models.load_model(w)
    elif tflite:
        interpreter = tf.lite.Interpreter(model_path=w)  # load TFLite model
        interpreter.allocate_tensors()  # allocate
        input_details = interpreter.get_input_details()  # inputs
        output_details = interpreter.get_output_details()  # outputs
imgsz = check_img_size(imgsz, s=stride)  # check image size

# Dataloader
if webcam:
    view_img = check_imshow()
    cudnn.benchmark = True  # set True to speed up constant image size inference
    dataset = LoadRealSense2(width = 640, height = 480, fps = 30, img_size = imgsz)
    bs = len(dataset)  # batch_size
else:
    dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt)
    bs = 1  # batch_size
vid_path, vid_writer = [None] * bs, [None] * bs

# Run inference
if pt and device.type != 'cpu':
    model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters())))  # run once
t0 = time.time()
for path, depth, distance, depth_scale, img, im0s, vid_cap in dataset:

    if onnx:
        img = img.astype('float32')
    else:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
    img = img / 255.0  # 0 - 255 to 0.0 - 1.0
    if len(img.shape) == 3:
        img = img[None]  # expand for batch dim

    # Inference
    t1 = time_sync()
    if pt:
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        pred = model(img, augment=augment, visualize=visualize)[0]
    elif onnx:
        pred = torch.tensor(session.run([session.get_outputs()[0].name], {session.get_inputs()[0].name: img}))
    else:  # tensorflow model (tflite, pb, saved_model)
        imn = img.permute(0, 2, 3, 1).cpu().numpy()  # image in numpy
        if pb:
            pred = frozen_func(x=tf.constant(imn)).numpy()
        elif saved_model:
            pred = model(imn, training=False).numpy()
        elif tflite:
            if tfl_int8:
                scale, zero_point = input_details[0]['quantization']
                imn = (imn / scale + zero_point).astype(np.uint8)
            interpreter.set_tensor(input_details[0]['index'], imn)
            interpreter.invoke()
            pred = interpreter.get_tensor(output_details[0]['index'])
            if tfl_int8:
                scale, zero_point = output_details[0]['quantization']
                pred = (pred.astype(np.float32) - zero_point) * scale
        pred[..., 0] *= imgsz[1]  # x
        pred[..., 1] *= imgsz[0]  # y
        pred[..., 2] *= imgsz[1]  # w
        pred[..., 3] *= imgsz[0]  # h
        pred = torch.tensor(pred)
        # print(pred)

    # NMS
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
    t2 = time_sync()

    # Second-stage classifier (optional)
    if classify:
        pred = apply_classifier(pred, modelc, img, im0s)

    # Process predictions
    for i, det in enumerate(pred):  # detections per image
        if webcam:  # batch_size >= 1
            p, s, im0, frame = path[i], f'{i}: ', im0s[i].copy(), dataset.count
        else:
            p, s, im0, frame = path, '', im0s.copy(), getattr(dataset, 'frame', 0)

        p = Path(p)  # to Path
        save_path = str(save_dir / p.name)  # img.jpg
        txt_path = str(save_dir / 'labels' / p.stem) #+ ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
        s += '%gx%g ' % img.shape[2:]  # #print string
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        imc = im0.copy() if save_crop else im0  # for save_crop
        # print('a')
        if len(det):
            print()  ###########começar a analisar aqui
            print("New Frame")
            for *xyxy, conf, cls in reversed(det):
                print(cls, torch.tensor(xyxy))
                xmin, ymin, xmax, ymax = torch.tensor(xyxy)
                xcenter = [(xmax + xmin) / 2]
                ycenter = [(ymax + ymin) / 2]
                print(xcenter, ycenter)

            for i, j in zip(xcenter,ycenter):
                try: 
                    distancia = (distance)/1000
                    print(distancia[int(j),int(i)], 'meters')
                except:
                    pass    

            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

            # Write results
            for *xyxy, conf, cls in reversed(det):
                if save_txt:  # Write to file
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    # print("-"*20)
                    print(cls, torch.tensor(xyxy))

                    line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                    with open(txt_path + '.txt', 'a') as f:
                        f.write(('%g ' * len(line)).rstrip() % line + '\n')

                if save_img or save_crop or view_img:  # Add bbox to image
                    c = int(cls)  # integer class
                    label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                    im0 = plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_width=line_thickness)
                    depth = plot_one_box(xyxy, depth, label=label, color=colors(c, True), line_width=line_thickness)
                    if save_crop:
                        save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

            for i, j in zip(xcenter,ycenter):
                try: 
                    cv2.circle(im0,(int(i), int(j)),4,(0,255,0))
                    cv2.circle(depth,(int(i), int(j)),4,(0,255,0))
                except:
                    pass    
        # #print time (inference + NMS)
        # print(f'{s}Done. ({t2 - t1:.3f}s)')
        # print(xyxy)

        # Stream results
        if view_img:
            cv2.imshow(str(3), depth)
            cv2.imshow(str(p), im0)
            cv2.waitKey(1)  # 1 millisecond

'''

Save results (image with detections)

        if save_img:
            #if dataset.mode == 'image':
            #    cv2.imwrite(save_path, im0)
            #else:  # 'video' or 'stream'
            if vid_path[i] != save_path:  # new video
                vid_path[i] = save_path
                if isinstance(vid_writer[i], cv2.VideoWriter):
                    vid_writer[i].release()  # release previous video writer
                if vid_cap:  # video
                    fps = vid_cap.get(cv2.CAP_PROP_FPS)
                    w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                    h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                else:  # stream
                    fps, w, h = 30, im0.shape[1], im0.shape[0]
                    save_path += '.mp4'
                vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
            vid_writer[i].write(im0)

if save_txt or save_img:
    s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
    # print(f"Results saved to {colorstr('bold', save_dir)}{s}")

if update:
    strip_optimizer(weights)  # update model (to fix SourceChangeWarning)

# print(f'Done. ({time.time() - t0:.3f}s)')
# print('a')

'''

def parse_opt(): parser = argparse.ArgumentParser() parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)') parser.add_argument('--source', type=str, default='data/images', help='file/dir/URL/glob, 0 for webcam') parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w') parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold') parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold') parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image') parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--view-img', action='store_true', help='show results') parser.add_argument('--save-txt', action='store_true', help='save results to .txt') parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels') parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes') parser.add_argument('--nosave', action='store_true', help='do not save images/videos') parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3') parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS') parser.add_argument('--augment', action='store_true', help='augmented inference') parser.add_argument('--visualize', action='store_true', help='visualize features') parser.add_argument('--update', action='store_true', help='update all models') parser.add_argument('--project', default='runs/detect', help='save results to project/name') parser.add_argument('--name', default='exp', help='save results to project/name') parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment') parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)') parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels') parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences') parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference') parser.add_argument('--tfl-int8', action='store_true', help='INT8 quantized TFLite model') opt = parser.parse_args() opt.imgsz = 2 if len(opt.imgsz) == 1 else 1 # expand return opt

def main(opt):

print(colorstr('detect: ') + ', '.join(f'{k}={v}' for k, v in vars(opt).items()))

check_requirements(exclude=('tensorboard', 'thop'))
run(**vars(opt))

if name == "main": opt = parse_opt() main(opt)

##############################################################################################

YOLOv5 πŸš€ by Ultralytics, GPL-3.0 license

""" Dataloaders and dataset utils """

import glob import hashlib import json import logging import os import random import shutil import time from itertools import repeat from multiprocessing.pool import ThreadPool, Pool from pathlib import Path from threading import Thread

import cv2 import numpy as np import pyrealsense2 as rs import torch import torch.nn.functional as F import yaml from PIL import Image, ExifTags from torch.utils.data import Dataset from tqdm import tqdm

from utils.augmentations import Albumentations, augment_hsv, copy_paste, letterbox, mixup, random_perspective from utils.general import check_requirements, check_file, check_dataset, xywh2xyxy, xywhn2xyxy, xyxy2xywhn, \ xyn2xy, segments2boxes, clean_str, xyxy2xywh from utils.torch_utils import torch_distributed_zero_first

Parameters

HELP_URL = 'https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data' IMG_FORMATS = ['bmp', 'jpg', 'jpeg', 'png', 'tif', 'tiff', 'dng', 'webp', 'mpo'] # acceptable image suffixes VID_FORMATS = ['mov', 'avi', 'mp4', 'mpg', 'mpeg', 'm4v', 'wmv', 'mkv'] # acceptable video suffixes NUM_THREADS = min(8, os.cpu_count()) # number of multiprocessing threads

Get orientation exif tag

for orientation in ExifTags.TAGS.keys(): if ExifTags.TAGS[orientation] == 'Orientation': break

def get_hash(paths):

Returns a single hash value of a list of paths (files or dirs)

size = sum(os.path.getsize(p) for p in paths if os.path.exists(p))  # sizes
h = hashlib.md5(str(size).encode())  # hash sizes
h.update(''.join(paths).encode())  # hash paths
return h.hexdigest()  # return hash

def exif_size(img):

Returns exif-corrected PIL size

s = img.size  # (width, height)
try:
    rotation = dict(img._getexif().items())[orientation]
    if rotation == 6:  # rotation 270
        s = (s[1], s[0])
    elif rotation == 8:  # rotation 90
        s = (s[1], s[0])
except:
    pass

return s

def exif_transpose(image): """ Transpose a PIL image accordingly if it has an EXIF Orientation tag. From https://github.com/python-pillow/Pillow/blob/master/src/PIL/ImageOps.py

:param image: The image to transpose.
:return: An image.
"""
exif = image.getexif()
orientation = exif.get(0x0112, 1)  # default 1
if orientation > 1:
    method = {2: Image.FLIP_LEFT_RIGHT,
              3: Image.ROTATE_180,
              4: Image.FLIP_TOP_BOTTOM,
              5: Image.TRANSPOSE,
              6: Image.ROTATE_270,
              7: Image.TRANSVERSE,
              8: Image.ROTATE_90,
              }.get(orientation)
    if method is not None:
        image = image.transpose(method)
        del exif[0x0112]
        image.info["exif"] = exif.tobytes()
return image

def create_dataloader(path, imgsz, batch_size, stride, single_cls=False, hyp=None, augment=False, cache=False, pad=0.0, rect=False, rank=-1, workers=8, image_weights=False, quad=False, prefix=''):

Make sure only the first process in DDP process the dataset first, and the following others can use the cache

with torch_distributed_zero_first(rank):
    dataset = LoadImagesAndLabels(path, imgsz, batch_size,
                                  augment=augment,  # augment images
                                  hyp=hyp,  # augmentation hyperparameters
                                  rect=rect,  # rectangular training
                                  cache_images=cache,
                                  single_cls=single_cls,
                                  stride=int(stride),
                                  pad=pad,
                                  image_weights=image_weights,
                                  prefix=prefix)

batch_size = min(batch_size, len(dataset))
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, workers])  # number of workers
sampler = torch.utils.data.distributed.DistributedSampler(dataset) if rank != -1 else None
loader = torch.utils.data.DataLoader if image_weights else InfiniteDataLoader
# Use torch.utils.data.DataLoader() if dataset.properties will update during training else InfiniteDataLoader()
dataloader = loader(dataset,
                    batch_size=batch_size,
                    num_workers=nw,
                    sampler=sampler,
                    pin_memory=True,
                    collate_fn=LoadImagesAndLabels.collate_fn4 if quad else LoadImagesAndLabels.collate_fn)
return dataloader, dataset

class InfiniteDataLoader(torch.utils.data.dataloader.DataLoader): """ Dataloader that reuses workers

Uses same syntax as vanilla DataLoader
"""

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    object.__setattr__(self, 'batch_sampler', _RepeatSampler(self.batch_sampler))
    self.iterator = super().__iter__()

def __len__(self):
    return len(self.batch_sampler.sampler)

def __iter__(self):
    for i in range(len(self)):
        yield next(self.iterator)

class _RepeatSampler(object): """ Sampler that repeats forever

Args:
    sampler (Sampler)
"""

def __init__(self, sampler):
    self.sampler = sampler

def __iter__(self):
    while True:
        yield from iter(self.sampler)

class LoadImages: # for inference def init(self, path, img_size=640, stride=32, auto=True): p = str(Path(path).absolute()) # os-agnostic absolute path if '' in p: files = sorted(glob.glob(p, recursive=True)) # glob elif os.path.isdir(p): files = sorted(glob.glob(os.path.join(p, '.*'))) # dir elif os.path.isfile(p): files = [p] # files else: raise Exception(f'ERROR: {p} does not exist')

    images = [x for x in files if x.split('.')[-1].lower() in IMG_FORMATS]
    videos = [x for x in files if x.split('.')[-1].lower() in VID_FORMATS]
    ni, nv = len(images), len(videos)

    self.img_size = img_size
    self.stride = stride
    self.files = images + videos
    self.nf = ni + nv  # number of files
    self.video_flag = [False] * ni + [True] * nv
    self.mode = 'image'
    self.auto = auto
    if any(videos):
        self.new_video(videos[0])  # new video
    else:
        self.cap = None
    assert self.nf > 0, f'No images or videos found in {p}. ' \
                        f'Supported formats are:\nimages: {IMG_FORMATS}\nvideos: {VID_FORMATS}'

def __iter__(self):
    self.count = 0
    return self

def __next__(self):
    if self.count == self.nf:
        raise StopIteration
    path = self.files[self.count]

    if self.video_flag[self.count]:
        # Read video
        self.mode = 'video'
        ret_val, img0 = self.cap.read()
        if not ret_val:
            self.count += 1
            self.cap.release()
            if self.count == self.nf:  # last video
                raise StopIteration
            else:
                path = self.files[self.count]
                self.new_video(path)
                ret_val, img0 = self.cap.read()

        self.frame += 1
        #print(f'video {self.count + 1}/{self.nf} ({self.frame}/{self.frames}) {path}: ', end='')

    else:
        # Read image
        self.count += 1
        img0 = cv2.imread(path)  # BGR
        assert img0 is not None, 'Image Not Found ' + path
        #print(f'image {self.count}/{self.nf} {path}: ', end='')

    # Padded resize
    img = letterbox(img0, self.img_size, stride=self.stride, auto=self.auto)[0]

    # Convert
    img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    img = np.ascontiguousarray(img)

    return path, img, img0, self.cap

def new_video(self, path):
    self.frame = 0
    self.cap = cv2.VideoCapture(path)
    self.frames = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))

def __len__(self):
    return self.nf  # number of files

class LoadWebcam: # for inference def init(self, pipe='0', img_size=640, stride=32): self.img_size = img_size self.stride = stride self.pipe = eval(pipe) if pipe.isnumeric() else pipe self.cap = cv2.VideoCapture(self.pipe) # video capture object self.cap.set(cv2.CAP_PROP_BUFFERSIZE, 3) # set buffer size

def __iter__(self):
    self.count = -1
    return self

def __next__(self):
    self.count += 1
    if cv2.waitKey(1) == ord('q'):  # q to quit
        self.cap.release()
        cv2.destroyAllWindows()
        raise StopIteration

    # Read frame
    ret_val, img0 = self.cap.read()
    img0 = cv2.flip(img0, 1)  # flip left-right

    # #print
    assert ret_val, f'Camera Error {self.pipe}'
    img_path = 'webcam.jpg'
    #print(f'webcam {self.count}: ', end='')

    # Padded resize
    img = letterbox(img0, self.img_size, stride=self.stride)[0]

    # Convert
    img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    img = np.ascontiguousarray(img)

    return img_path, img, img0, None

def __len__(self):
    return 0

class LoadStreams: # multiple IP or RTSP cameras def init(self, sources='streams.txt', img_size=640, stride=32, auto=True): self.mode = 'stream' self.img_size = img_size self.stride = stride

    if os.path.isfile(sources):
        with open(sources, 'r') as f:
            sources = [x.strip() for x in f.read().strip().splitlines() if len(x.strip())]
    else:
        sources = [sources]

    n = len(sources)
    self.imgs, self.fps, self.frames, self.threads = [None] * n, [0] * n, [0] * n, [None] * n
    self.sources = [clean_str(x) for x in sources]  # clean source names for later
    self.auto = auto
    for i, s in enumerate(sources):  # index, source
        # Start thread to read frames from video stream
        #print(f'{i + 1}/{n}: {s}... ', end='')
        if 'youtube.com/' in s or 'youtu.be/' in s:  # if source is YouTube video
            check_requirements(('pafy', 'youtube_dl'))
            import pafy
            s = pafy.new(s).getbest(preftype="mp4").url  # YouTube URL
        s = eval(s) if s.isnumeric() else s  # i.e. s = '0' local webcam
        cap = cv2.VideoCapture(s)
        assert cap.isOpened(), f'Failed to open {s}'
        #w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        #h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        w = 1280
        h = 980
        self.fps[i] = max(cap.get(cv2.CAP_PROP_FPS) % 100, 0) or 30.0  # 30 FPS fallback
        self.frames[i] = max(int(cap.get(cv2.CAP_PROP_FRAME_COUNT)), 0) or float('inf')  # infinite stream fallback

        _, self.imgs[i] = cap.read()  # guarantee first frame
        self.threads[i] = Thread(target=self.update, args=([i, cap]), daemon=True)
        #print(f" success ({self.frames[i]} frames {w}x{h} at {self.fps[i]:.2f} FPS)")
        self.threads[i].start()
    #print('')  # newline

    # check for common shapes
    s = np.stack([letterbox(x, self.img_size, stride=self.stride, auto=self.auto)[0].shape for x in self.imgs], 0)  # shapes
    self.rect = np.unique(s, axis=0).shape[0] == 1  # rect inference if all shapes equal
    if not self.rect:
        print()
        #print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')

def update(self, i, cap):
    # Read stream `i` frames in daemon thread
    n, f, read = 0, self.frames[i], 1  # frame number, frame array, inference every 'read' frame
    while cap.isOpened() and n < f:
        n += 1
        # _, self.imgs[index] = cap.read()
        cap.grab()
        if n % read == 0:
            success, im = cap.retrieve()
            self.imgs[i] = im if success else self.imgs[i] * 0
        time.sleep(1 / self.fps[i])  # wait time

def __iter__(self):
    self.count = -1
    return self

def __next__(self):
    self.count += 1
    if not all(x.is_alive() for x in self.threads) or cv2.waitKey(1) == ord('q'):  # q to quit
        cv2.destroyAllWindows()
        raise StopIteration

    # Letterbox
    img0 = self.imgs.copy()
    img = [letterbox(x, self.img_size, stride=self.stride, auto=self.rect and self.auto)[0] for x in img0]

    # Stack
    img = np.stack(img, 0)

    # Convert
    img = img[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW
    img = np.ascontiguousarray(img)

    return self.sources, img, img0, None

def __len__(self):
    return len(self.sources)  # 1E12 frames = 32 streams at 30 FPS for 30 years

def img2label_paths(img_paths):

Define label paths as a function of image paths

sa, sb = os.sep + 'images' + os.sep, os.sep + 'labels' + os.sep  # /images/, /labels/ substrings
return [sb.join(x.rsplit(sa, 1)).rsplit('.', 1)[0] + '.txt' for x in img_paths]

Stream from RealSense D435

class LoadRealSense2: # Stream from Intel RealSense D435 def init(self, width=640, height=480, fps=30, img_size = 512):

Variabels for setup

    self.width = width
    self.height = height
    self.fps = fps
    self.img_size = img_size
    self.half = False

    # Process variables
    self.imgs = [None]
    self.depths = [None]
    self.distance = None

    # Setup
    self.pipe = rs.pipeline()
    self.cfg = rs.config()
    self.cfg.enable_stream(rs.stream.depth, self.width, self.height, rs.format.z16, self.fps)
    self.cfg.enable_stream(rs.stream.color, self.width, self.height, rs.format.bgr8, self.fps)

    self.align = rs.align(rs.stream.color)
    self.colorizer = rs.colorizer()

    # Start streaming
    self.profile = self.pipe.start(self.cfg)
    self.path = rs.pipeline_profile()

    print("streaming at w = " + str(self.width) + " h = " + str(self.height) + " fps = " + str(self.fps))

def update(self):
    depth_scale = None
    # depth_intrin = None
    while True:
        #Wait for frames and get the data
        self.frames = self.pipe.wait_for_frames()
        self.depth_frame = self.frames.get_depth_frame()
        self.color_frame = self.frames.get_color_frame()

        #Wait until RGB and depth frames are synchronised
        if not self.depth_frame or not self.color_frame:
            continue

        # get intrinsics
        # depth_intrin = self.depth_frame.profile.as_video_stream_profile().intrinsics

        #get RGB data and convert it to numpy array
        img0 = np.asanyarray(self.color_frame.get_data())

        # aligned depth frame
        aligned_df = self.aligned(self.frames)

        # colorized depth_frame (display, np.array)
        depth0 = self.colorizing(aligned_df)

        # aligned depth -> for depth calculation
        distance0 = np.asanyarray(aligned_df.get_data())

        #get depth_scale
        depth_scale = self.scale(self.profile)

        #Expand dimensi image biar jadi 4 dimensi (biar bisa masuk ke fungsi letterbox)
        self.imgs = np.expand_dims(img0, axis=0)

        #Kalo yang depth gaperlu, karena gaakan dimasukin ke YOLO
        self.depths = depth0
        self.distance = distance0

        break

    #Fungsi Letterbox: Resize gambar jadi 416x416 biar bisa masuk YOLOV3
    s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.imgs], 0)  # inference shapes

    # Rectangular
    self.rect = np.unique(s, axis=0).shape[0] == 1

    if not self.rect:
        print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.')

    time.sleep(0.01)  # wait time
    # return self.rect, depth_scale, depth_intrin
    return self.rect, depth_scale

#Fungsi buat dapetin depth_scale
def scale(self, profile):
    depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
    return depth_scale

#Fungsi buat aligned_depth, tapi belom di convert ke numpy array
def aligned(self, frames):
    frames = self.align.process(frames)
    aligned_depth_frame = frames.get_depth_frame()
    return aligned_depth_frame

#Warnain aligned_depth, (for displaying purposes only)
def colorizing(self, aligned_depth_frame):
    colorized_depth = np.asanyarray(self.colorizer.colorize(aligned_depth_frame).get_data())
    return(colorized_depth)

#Ini kayaknya gaperlu sih bisa diapus, tpi kalo ada masalah bisa dipake buat troubleshoot
def __iter__(self):
    self.count = 0
    return self

#Pemrosesan lebih lanjut dari data yang diambil, khususnya RGB biar bisa dimasukin ke YOLO
def __next__(self):
    self.count += 1

    #Dari fungsi update ambil rect dan depth_scale biar bisa di return di fungsi next
    # self.rect, depth_scale, depth_intrin = self.update()
    self.rect, depth_scale = self.update()

    #Copy semua data dari fungsi update
    img0 = self.imgs.copy()
    depth = self.depths.copy()
    distance = self.distance.copy()

    # #Buat keluar dari program
    # if cv2.waitKey(1) == ord('q'):  # q to quit
    #     cv2.destroyAllWindows()
    #     raise StopIteration

    # Path
    img_path = 'realsense.jpg'

    # Letterbox
    img = [letterbox(x, new_shape=self.img_size, auto=self.rect)[0] for x in img0]

    # Stack
    img = np.stack(img, 0)

    # Convert Image BGR to RGB
    img = img[:, :, :, ::-1].transpose(0, 3, 1, 2)  # BGR to RGB, to 3x416x416, uint8 to float32
    img = np.ascontiguousarray(img, dtype=np.float16 if self.half else np.float32)
    #img /= 255.0  # 0 - 255 to 0.0 - 1.0

    # Return depth, depth0, img, img0
    # return str(img_path), depth, distance, depth_scale, img, img0, depth_intrin, None
    return str(img_path), depth, distance, depth_scale, img, img0, None

def __len__(self):
    return 0  # 1E12 frames = 32 streams at 30 FPS for 30 years

class LoadImagesAndLabels(Dataset): # for training/testing def init(self, path, img_size=640, batch_size=16, augment=False, hyp=None, rect=False, image_weights=False, cache_images=False, single_cls=False, stride=32, pad=0.0, prefix=''): self.img_size = img_size self.augment = augment self.hyp = hyp self.image_weights = image_weights self.rect = False if image_weights else rect self.mosaic = self.augment and not self.rect # load 4 images at a time into a mosaic (only during training) self.mosaic_border = [-img_size // 2, -img_size // 2] self.stride = stride self.path = path self.albumentations = Albumentations() if augment else None

    try:
        f = []  # image files
        for p in path if isinstance(path, list) else [path]:
            p = Path(p)  # os-agnostic
            if p.is_dir():  # dir
                f += glob.glob(str(p / '**' / '*.*'), recursive=True)
                # f = list(p.rglob('**/*.*'))  # pathlib
            elif p.is_file():  # file
                with open(p, 'r') as t:
                    t = t.read().strip().splitlines()
                    parent = str(p.parent) + os.sep
                    f += [x.replace('./', parent) if x.startswith('./') else x for x in t]  # local to global path
                    # f += [p.parent / x.lstrip(os.sep) for x in t]  # local to global path (pathlib)
            else:
                raise Exception(f'{prefix}{p} does not exist')
        self.img_files = sorted([x.replace('/', os.sep) for x in f if x.split('.')[-1].lower() in IMG_FORMATS])
        # self.img_files = sorted([x for x in f if x.suffix[1:].lower() in img_formats])  # pathlib
        assert self.img_files, f'{prefix}No images found'
    except Exception as e:
        raise Exception(f'{prefix}Error loading data from {path}: {e}\nSee {HELP_URL}')

    # Check cache
    self.label_files = img2label_paths(self.img_files)  # labels
    cache_path = (p if p.is_file() else Path(self.label_files[0]).parent).with_suffix('.cache')
    try:
        cache, exists = np.load(cache_path, allow_pickle=True).item(), True  # load dict
        assert cache['version'] == 0.4 and cache['hash'] == get_hash(self.label_files + self.img_files)
    except:
        cache, exists = self.cache_labels(cache_path, prefix), False  # cache

    # Display cache
    nf, nm, ne, nc, n = cache.pop('results')  # found, missing, empty, corrupted, total
    if exists:
        d = f"Scanning '{cache_path}' images and labels... {nf} found, {nm} missing, {ne} empty, {nc} corrupted"
        tqdm(None, desc=prefix + d, total=n, initial=n)  # display cache results
        if cache['msgs']:
            logging.info('\n'.join(cache['msgs']))  # display warnings
    assert nf > 0 or not augment, f'{prefix}No labels in {cache_path}. Can not train without labels. See {HELP_URL}'

    # Read cache
    [cache.pop(k) for k in ('hash', 'version', 'msgs')]  # remove items
    labels, shapes, self.segments = zip(*cache.values())
    self.labels = list(labels)
    self.shapes = np.array(shapes, dtype=np.float64)
    self.img_files = list(cache.keys())  # update
    self.label_files = img2label_paths(cache.keys())  # update
    if single_cls:
        for x in self.labels:
            x[:, 0] = 0

    n = len(shapes)  # number of images
    bi = np.floor(np.arange(n) / batch_size).astype(np.int)  # batch index
    nb = bi[-1] + 1  # number of batches
    self.batch = bi  # batch index of image
    self.n = n
    self.indices = range(n)

    # Rectangular Training
    if self.rect:
        # Sort by aspect ratio
        s = self.shapes  # wh
        ar = s[:, 1] / s[:, 0]  # aspect ratio
        irect = ar.argsort()
        self.img_files = [self.img_files[i] for i in irect]
        self.label_files = [self.label_files[i] for i in irect]
        self.labels = [self.labels[i] for i in irect]
        self.shapes = s[irect]  # wh
        ar = ar[irect]

        # Set training image shapes
        shapes = [[1, 1]] * nb
        for i in range(nb):
            ari = ar[bi == i]
            mini, maxi = ari.min(), ari.max()
            if maxi < 1:
                shapes[i] = [maxi, 1]
            elif mini > 1:
                shapes[i] = [1, 1 / mini]

        self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(np.int) * stride

    # Cache images into memory for faster training (WARNING: large datasets may exceed system RAM)
    self.imgs, self.img_npy = [None] * n, [None] * n
    if cache_images:
        if cache_images == 'disk':
            self.im_cache_dir = Path(Path(self.img_files[0]).parent.as_posix() + '_npy')
            self.img_npy = [self.im_cache_dir / Path(f).with_suffix('.npy').name for f in self.img_files]
            self.im_cache_dir.mkdir(parents=True, exist_ok=True)
        gb = 0  # Gigabytes of cached images
        self.img_hw0, self.img_hw = [None] * n, [None] * n
        results = ThreadPool(NUM_THREADS).imap(lambda x: load_image(*x), zip(repeat(self), range(n)))
        pbar = tqdm(enumerate(results), total=n)
        for i, x in pbar:
            if cache_images == 'disk':
                if not self.img_npy[i].exists():
                    np.save(self.img_npy[i].as_posix(), x[0])
                gb += self.img_npy[i].stat().st_size
            else:
                self.imgs[i], self.img_hw0[i], self.img_hw[i] = x  # im, hw_orig, hw_resized = load_image(self, i)
                gb += self.imgs[i].nbytes
            pbar.desc = f'{prefix}Caching images ({gb / 1E9:.1f}GB {cache_images})'
        pbar.close()

def cache_labels(self, path=Path('./labels.cache'), prefix=''):
    # Cache dataset labels, check images and read shapes
    x = {}  # dict
    nm, nf, ne, nc, msgs = 0, 0, 0, 0, []  # number missing, found, empty, corrupt, messages
    desc = f"{prefix}Scanning '{path.parent / path.stem}' images and labels..."
    with Pool(NUM_THREADS) as pool:
        pbar = tqdm(pool.imap_unordered(verify_image_label, zip(self.img_files, self.label_files, repeat(prefix))),
                    desc=desc, total=len(self.img_files))
        for im_file, l, shape, segments, nm_f, nf_f, ne_f, nc_f, msg in pbar:
            nm += nm_f
            nf += nf_f
            ne += ne_f
            nc += nc_f
            if im_file:
                x[im_file] = [l, shape, segments]
            if msg:
                msgs.append(msg)
            pbar.desc = f"{desc}{nf} found, {nm} missing, {ne} empty, {nc} corrupted"

    pbar.close()
    if msgs:
        logging.info('\n'.join(msgs))
    if nf == 0:
        logging.info(f'{prefix}WARNING: No labels found in {path}. See {HELP_URL}')
    x['hash'] = get_hash(self.label_files + self.img_files)
    x['results'] = nf, nm, ne, nc, len(self.img_files)
    x['msgs'] = msgs  # warnings
    x['version'] = 0.4  # cache version
    try:
        np.save(path, x)  # save cache for next time
        path.with_suffix('.cache.npy').rename(path)  # remove .npy suffix
        logging.info(f'{prefix}New cache created: {path}')
    except Exception as e:
        logging.info(f'{prefix}WARNING: Cache directory {path.parent} is not writeable: {e}')  # path not writeable
    return x

def __len__(self):
    return len(self.img_files)

# def __iter__(self):
#     self.count = -1
#     #print('ran dataset iter')
#     #self.shuffled_vector = np.random.permutation(self.nF) if self.augment else np.arange(self.nF)
#     return self

def __getitem__(self, index):
    index = self.indices[index]  # linear, shuffled, or image_weights

    hyp = self.hyp
    mosaic = self.mosaic and random.random() < hyp['mosaic']
    if mosaic:
        # Load mosaic
        img, labels = load_mosaic(self, index)
        shapes = None

        # MixUp augmentation
        if random.random() < hyp['mixup']:
            img, labels = mixup(img, labels, *load_mosaic(self, random.randint(0, self.n - 1)))

    else:
        # Load image
        img, (h0, w0), (h, w) = load_image(self, index)

        # Letterbox
        shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shape
        img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)
        shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling

        labels = self.labels[index].copy()
        if labels.size:  # normalized xywh to pixel xyxy format
            labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1])

        if self.augment:
            img, labels = random_perspective(img, labels,
                                             degrees=hyp['degrees'],
                                             translate=hyp['translate'],
                                             scale=hyp['scale'],
                                             shear=hyp['shear'],
                                             perspective=hyp['perspective'])

    nl = len(labels)  # number of labels
    if nl:
        labels[:, 1:5] = xyxy2xywhn(labels[:, 1:5], w=img.shape[1], h=img.shape[0], clip=True, eps=1E-3)

    if self.augment:
        # Albumentations
        img, labels = self.albumentations(img, labels)
        nl = len(labels) # update after albumentations

        # HSV color-space
        augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])

        # Flip up-down
        if random.random() < hyp['flipud']:
            img = np.flipud(img)
            if nl:
                labels[:, 2] = 1 - labels[:, 2]

        # Flip left-right
        if random.random() < hyp['fliplr']:
            img = np.fliplr(img)
            if nl:
                labels[:, 1] = 1 - labels[:, 1]

        # Cutouts
        # labels = cutout(img, labels, p=0.5)

    labels_out = torch.zeros((nl, 6))
    if nl:
        labels_out[:, 1:] = torch.from_numpy(labels)

    # Convert
    img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    img = np.ascontiguousarray(img)

    return torch.from_numpy(img), labels_out, self.img_files[index], shapes

@staticmethod
def collate_fn(batch):
    img, label, path, shapes = zip(*batch)  # transposed
    for i, l in enumerate(label):
        l[:, 0] = i  # add target image index for build_targets()
    return torch.stack(img, 0), torch.cat(label, 0), path, shapes

@staticmethod
def collate_fn4(batch):
    img, label, path, shapes = zip(*batch)  # transposed
    n = len(shapes) // 4
    img4, label4, path4, shapes4 = [], [], path[:n], shapes[:n]

    ho = torch.tensor([[0., 0, 0, 1, 0, 0]])
    wo = torch.tensor([[0., 0, 1, 0, 0, 0]])
    s = torch.tensor([[1, 1, .5, .5, .5, .5]])  # scale
    for i in range(n):  # zidane torch.zeros(16,3,720,1280)  # BCHW
        i *= 4
        if random.random() < 0.5:
            im = F.interpolate(img[i].unsqueeze(0).float(), scale_factor=2., mode='bilinear', align_corners=False)[
                0].type(img[i].type())
            l = label[i]
        else:
            im = torch.cat((torch.cat((img[i], img[i + 1]), 1), torch.cat((img[i + 2], img[i + 3]), 1)), 2)
            l = torch.cat((label[i], label[i + 1] + ho, label[i + 2] + wo, label[i + 3] + ho + wo), 0) * s
        img4.append(im)
        label4.append(l)

    for i, l in enumerate(label4):
        l[:, 0] = i  # add target image index for build_targets()

    return torch.stack(img4, 0), torch.cat(label4, 0), path4, shapes4

Ancillary functions --------------------------------------------------------------------------------------------------

def load_image(self, i):

loads 1 image from dataset index 'i', returns im, original hw, resized hw

im = self.imgs[i]
if im is None:  # not cached in ram
    npy = self.img_npy[i]
    if npy and npy.exists():  # load npy
        im = np.load(npy)
    else:  # read image
        path = self.img_files[i]
        im = cv2.imread(path)  # BGR
        assert im is not None, 'Image Not Found ' + path
    h0, w0 = im.shape[:2]  # orig hw
    r = self.img_size / max(h0, w0)  # ratio
    if r != 1:  # if sizes are not equal
        im = cv2.resize(im, (int(w0 * r), int(h0 * r)),
                        interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)
    return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized
else:
    return self.imgs[i], self.img_hw0[i], self.img_hw[i]  # im, hw_original, hw_resized

def load_mosaic(self, index):

loads images in a 4-mosaic

labels4, segments4 = [], []
s = self.img_size
yc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border]  # mosaic center x, y
indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices
for i, index in enumerate(indices):
    # Load image
    img, _, (h, w) = load_image(self, index)

    # place img in img4
    if i == 0:  # top left
        img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
        x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
        x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
    elif i == 1:  # top right
        x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
        x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
    elif i == 2:  # bottom left
        x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
        x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
    elif i == 3:  # bottom right
        x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
        x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

    img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
    padw = x1a - x1b
    padh = y1a - y1b

    # Labels
    labels, segments = self.labels[index].copy(), self.segments[index].copy()
    if labels.size:
        labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format
        segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
    labels4.append(labels)
    segments4.extend(segments)

# Concat/clip labels
labels4 = np.concatenate(labels4, 0)
for x in (labels4[:, 1:], *segments4):
    np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()
# img4, labels4 = replicate(img4, labels4)  # replicate

# Augment
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
img4, labels4 = random_perspective(img4, labels4, segments4,
                                   degrees=self.hyp['degrees'],
                                   translate=self.hyp['translate'],
                                   scale=self.hyp['scale'],
                                   shear=self.hyp['shear'],
                                   perspective=self.hyp['perspective'],
                                   border=self.mosaic_border)  # border to remove

return img4, labels4

def load_mosaic9(self, index):

loads images in a 9-mosaic

labels9, segments9 = [], []
s = self.img_size
indices = [index] + random.choices(self.indices, k=8)  # 8 additional image indices
for i, index in enumerate(indices):
    # Load image
    img, _, (h, w) = load_image(self, index)

    # place img in img9
    if i == 0:  # center
        img9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
        h0, w0 = h, w
        c = s, s, s + w, s + h  # xmin, ymin, xmax, ymax (base) coordinates
    elif i == 1:  # top
        c = s, s - h, s + w, s
    elif i == 2:  # top right
        c = s + wp, s - h, s + wp + w, s
    elif i == 3:  # right
        c = s + w0, s, s + w0 + w, s + h
    elif i == 4:  # bottom right
        c = s + w0, s + hp, s + w0 + w, s + hp + h
    elif i == 5:  # bottom
        c = s + w0 - w, s + h0, s + w0, s + h0 + h
    elif i == 6:  # bottom left
        c = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + h
    elif i == 7:  # left
        c = s - w, s + h0 - h, s, s + h0
    elif i == 8:  # top left
        c = s - w, s + h0 - hp - h, s, s + h0 - hp

    padx, pady = c[:2]
    x1, y1, x2, y2 = [max(x, 0) for x in c]  # allocate coords

    # Labels
    labels, segments = self.labels[index].copy(), self.segments[index].copy()
    if labels.size:
        labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padx, pady)  # normalized xywh to pixel xyxy format
        segments = [xyn2xy(x, w, h, padx, pady) for x in segments]
    labels9.append(labels)
    segments9.extend(segments)

    # Image
    img9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:]  # img9[ymin:ymax, xmin:xmax]
    hp, wp = h, w  # height, width previous

# Offset
yc, xc = [int(random.uniform(0, s)) for _ in self.mosaic_border]  # mosaic center x, y
img9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]

# Concat/clip labels
labels9 = np.concatenate(labels9, 0)
labels9[:, [1, 3]] -= xc
labels9[:, [2, 4]] -= yc
c = np.array([xc, yc])  # centers
segments9 = [x - c for x in segments9]

for x in (labels9[:, 1:], *segments9):
    np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()
# img9, labels9 = replicate(img9, labels9)  # replicate

# Augment
img9, labels9 = random_perspective(img9, labels9, segments9,
                                   degrees=self.hyp['degrees'],
                                   translate=self.hyp['translate'],
                                   scale=self.hyp['scale'],
                                   shear=self.hyp['shear'],
                                   perspective=self.hyp['perspective'],
                                   border=self.mosaic_border)  # border to remove

return img9, labels9

def create_folder(path='./new'):

Create folder

if os.path.exists(path):
    shutil.rmtree(path)  # delete output folder
os.makedirs(path)  # make new output folder

def flatten_recursive(path='../datasets/coco128'):

Flatten a recursive directory by bringing all files to top level

new_path = Path(path + '_flat')
create_folder(new_path)
for file in tqdm(glob.glob(str(Path(path)) + '/**/*.*', recursive=True)):
    shutil.copyfile(file, new_path / Path(file).name)

def extract_boxes(path='../datasets/coco128'): # from utils.datasets import *; extract_boxes()

Convert detection dataset into classification dataset, with one directory per class

path = Path(path)  # images dir
shutil.rmtree(path / 'classifier') if (path / 'classifier').is_dir() else None  # remove existing
files = list(path.rglob('*.*'))
n = len(files)  # number of files
for im_file in tqdm(files, total=n):
    if im_file.suffix[1:] in IMG_FORMATS:
        # image
        im = cv2.imread(str(im_file))[..., ::-1]  # BGR to RGB
        h, w = im.shape[:2]

        # labels
        lb_file = Path(img2label_paths([str(im_file)])[0])
        if Path(lb_file).exists():
            with open(lb_file, 'r') as f:
                lb = np.array([x.split() for x in f.read().strip().splitlines()], dtype=np.float32)  # labels

            for j, x in enumerate(lb):
                c = int(x[0])  # class
                f = (path / 'classifier') / f'{c}' / f'{path.stem}_{im_file.stem}_{j}.jpg'  # new filename
                if not f.parent.is_dir():
                    f.parent.mkdir(parents=True)

                b = x[1:] * [w, h, w, h]  # box
                # b[2:] = b[2:].max()  # rectangle to square
                b[2:] = b[2:] * 1.2 + 3  # pad
                b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(np.int)

                b[[0, 2]] = np.clip(b[[0, 2]], 0, w)  # clip boxes outside of image
                b[[1, 3]] = np.clip(b[[1, 3]], 0, h)
                assert cv2.imwrite(str(f), im[b[1]:b[3], b[0]:b[2]]), f'box failure in {f}'

def autosplit(path='../datasets/coco128/images', weights=(0.9, 0.1, 0.0), annotatedonly=False): """ Autosplit a dataset into train/val/test splits and save path/autosplit.txt files Usage: from utils.datasets import ; autosplit() Arguments path: Path to images directory weights: Train, val, test weights (list, tuple) annotated_only: Only use images with an annotated txt file """ path = Path(path) # images dir files = sum([list(path.rglob(f"*.{img_ext}")) for img_ext in IMG_FORMATS], []) # image files only n = len(files) # number of files random.seed(0) # for reproducibility indices = random.choices([0, 1, 2], weights=weights, k=n) # assign each image to a split

txt = ['autosplit_train.txt', 'autosplit_val.txt', 'autosplit_test.txt']  # 3 txt files
[(path.parent / x).unlink(missing_ok=True) for x in txt]  # remove existing

#print(f'Autosplitting images from {path}' + ', using *.txt labeled images only' * annotated_only)
for i, img in tqdm(zip(indices, files), total=n):
    if not annotated_only or Path(img2label_paths([str(img)])[0]).exists():  # check label
        with open(path.parent / txt[i], 'a') as f:
            f.write('./' + img.relative_to(path.parent).as_posix() + '\n')  # add image to txt file

def verify_image_label(args):

Verify one image-label pair

im_file, lb_file, prefix = args
nm, nf, ne, nc = 0, 0, 0, 0  # number missing, found, empty, corrupt
try:
    # verify images
    im = Image.open(im_file)
    im.verify()  # PIL verify
    shape = exif_size(im)  # image size
    assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'
    assert im.format.lower() in IMG_FORMATS, f'invalid image format {im.format}'
    if im.format.lower() in ('jpg', 'jpeg'):
        with open(im_file, 'rb') as f:
            f.seek(-2, 2)
            assert f.read() == b'\xff\xd9', 'corrupted JPEG'

    # verify labels
    segments = []  # instance segments
    if os.path.isfile(lb_file):
        nf = 1  # label found
        with open(lb_file, 'r') as f:
            l = [x.split() for x in f.read().strip().splitlines() if len(x)]
            if any([len(x) > 8 for x in l]):  # is segment
                classes = np.array([x[0] for x in l], dtype=np.float32)
                segments = [np.array(x[1:], dtype=np.float32).reshape(-1, 2) for x in l]  # (cls, xy1...)
                l = np.concatenate((classes.reshape(-1, 1), segments2boxes(segments)), 1)  # (cls, xywh)
            l = np.array(l, dtype=np.float32)
        if len(l):
            assert l.shape[1] == 5, 'labels require 5 columns each'
            assert (l >= 0).all(), 'negative labels'
            assert (l[:, 1:] <= 1).all(), 'non-normalized or out of bounds coordinate labels'
            assert np.unique(l, axis=0).shape[0] == l.shape[0], 'duplicate labels'
        else:
            ne = 1  # label empty
            l = np.zeros((0, 5), dtype=np.float32)
    else:
        nm = 1  # label missing
        l = np.zeros((0, 5), dtype=np.float32)
    return im_file, l, shape, segments, nm, nf, ne, nc, ''
except Exception as e:
    nc = 1
    msg = f'{prefix}WARNING: Ignoring corrupted image and/or label {im_file}: {e}'
    return [None, None, None, None, nm, nf, ne, nc, msg]

def dataset_stats(path='coco128.yaml', autodownload=False, verbose=False, profile=False, hub=False): """ Return dataset statistics dictionary with images and instances counts per split per class To run in parent directory: export PYTHONPATH="$PWD/yolov5" Usage1: from utils.datasets import ; dataset_stats('coco128.yaml', autodownload=True) Usage2: from utils.datasets import ; dataset_stats('../datasets/coco128_with_yaml.zip') Arguments path: Path to data.yaml or data.zip (with data.yaml inside data.zip) autodownload: Attempt to download dataset if not found locally verbose: #print stats dictionary """

def round_labels(labels):
    # Update labels to integer class and 6 decimal place floats
    return [[int(c), *[round(x, 4) for x in points]] for c, *points in labels]

def unzip(path):
    # Unzip data.zip TODO: CONSTRAINT: path/to/abc.zip MUST unzip to 'path/to/abc/'
    if str(path).endswith('.zip'):  # path is data.zip
        assert Path(path).is_file(), f'Error unzipping {path}, file not found'
        assert os.system(f'unzip -q {path} -d {path.parent}') == 0, f'Error unzipping {path}'
        dir = path.with_suffix('')  # dataset directory
        return True, str(dir), next(dir.rglob('*.yaml'))  # zipped, data_dir, yaml_path
    else:  # path is data.yaml
        return False, None, path

def hub_ops(f, max_dim=1920):
    # HUB ops for 1 image 'f'
    im = Image.open(f)
    r = max_dim / max(im.height, im.width)  # ratio
    if r < 1.0:  # image too large
        im = im.resize((int(im.width * r), int(im.height * r)))
    im.save(im_dir / Path(f).name, quality=75)  # save

zipped, data_dir, yaml_path = unzip(Path(path))
with open(check_file(yaml_path), errors='ignore') as f:
    data = yaml.safe_load(f)  # data dict
    if zipped:
        data['path'] = data_dir  # TODO: should this be dir.resolve()?
check_dataset(data, autodownload)  # download dataset if missing
hub_dir = Path(data['path'] + ('-hub' if hub else ''))
stats = {'nc': data['nc'], 'names': data['names']}  # statistics dictionary
for split in 'train', 'val', 'test':
    if data.get(split) is None:
        stats[split] = None  # i.e. no test set
        continue
    x = []
    dataset = LoadImagesAndLabels(data[split])  # load dataset
    for label in tqdm(dataset.labels, total=dataset.n, desc='Statistics'):
        x.append(np.bincount(label[:, 0].astype(int), minlength=data['nc']))
    x = np.array(x)  # shape(128x80)
    stats[split] = {'instance_stats': {'total': int(x.sum()), 'per_class': x.sum(0).tolist()},
                    'image_stats': {'total': dataset.n, 'unlabelled': int(np.all(x == 0, 1).sum()),
                                    'per_class': (x > 0).sum(0).tolist()},
                    'labels': [{str(Path(k).name): round_labels(v.tolist())} for k, v in
                               zip(dataset.img_files, dataset.labels)]}

    if hub:
        im_dir = hub_dir / 'images'
        im_dir.mkdir(parents=True, exist_ok=True)
        for _ in tqdm(ThreadPool(NUM_THREADS).imap(hub_ops, dataset.img_files), total=dataset.n, desc='HUB Ops'):
            pass

# Profile
stats_path = hub_dir / 'stats.json'
if profile:
    for _ in range(1):
        file = stats_path.with_suffix('.npy')
        t1 = time.time()
        np.save(file, stats)
        t2 = time.time()
        x = np.load(file, allow_pickle=True)
        #print(f'stats.npy times: {time.time() - t2:.3f}s read, {t2 - t1:.3f}s write')

        file = stats_path.with_suffix('.json')
        t1 = time.time()
        with open(file, 'w') as f:
            json.dump(stats, f)  # save stats *.json
        t2 = time.time()
        with open(file, 'r') as f:
            x = json.load(f)  # load hyps dict
        #print(f'stats.json times: {time.time() - t2:.3f}s read, {t2 - t1:.3f}s write')

# Save, #print and return
if hub:
    #print(f'Saving {stats_path.resolve()}...')
    with open(stats_path, 'w') as f:
        json.dump(stats, f)  # save stats.json
if verbose:
    print()
    #print(json.dumps(stats, indent=2, sort_keys=False))
return stats

#################################################################################################

glenn-jocher commented 3 years ago

@HeitorDC the camera Field Of View (FOV) characteristics have nothing to do with YOLOv5 and are dependent on the actual camera hardware used to record a given image, and the camera settings at time of capture (i.e. zoom).

HeitorDC commented 3 years ago

Hi, mr. @glenn-jocher, thanks for your answer!!

About that, there isn't any way to get the position of the central point of the bounding box in the frame with the detect code of the YOLOv5?

glenn-jocher commented 3 years ago

@HeitorDC oh, yes, if you just want the xcenter, ycenter of a YOLOv5 detection do this. See PyTorch Hub tutorial for details. You can also use results.pandas().xywh[0] for xcenter, ycenter, width, height format results.

Simple Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the lightest and fastest YOLOv5 model. For details on all available models please see the README.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
img = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(img)

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

YOLOv5 Tutorials

HeitorDC commented 3 years ago

Thanks for your answer, @glenn-jocher! But can I locate that point on the frame and print that location like "top right" or "bottom left" as I move the camera or the object?

glenn-jocher commented 3 years ago

@HeitorDC I don't know what you're asking. The value you wanted is obtained as in the example above. What you do with that value is up to you.

github-actions[bot] commented 2 years ago

πŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 πŸš€ resources:

Access additional Ultralytics ⚑ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 πŸš€ and Vision AI ⭐!

HAN-007 commented 2 years ago

@HeitorDC HeitorDC I saw your question and I tried to find an angle. you cannot find angle with the center point of the detected box. crop the box from the picture, first turn it gray gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) apply canny to find edges, but be careful with the parameters otherwise we will not work properly optimize the parameters according to your own workspace. edges = cv2.Canny(gray, 50, 150, apertureSize=3) angles = [] Let's add the angle of the lines in the box with the horizontal plane into an array with "HoughLinesP"

lines = cv2.HoughLinesP(edges, 1, math.pi / 180.0, 90)
for [[x1, y1, x2, y2]] in lines:
    #cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)
    angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
    if(angle != 0):
        angles.append(angle)

We determined the angle that the lines make the most as the angle made by the box and let's turn the box according to this angle.

median_angle = np.median(angles)
img = ndimage.rotate(img, median_angle)

I hope it solves your problem

Ramanmagar commented 1 year ago

when we use save-crop that files default saving in '.jpg' format. I want that crop in '.tif' files insted of jpg, what cahnges needs to do in detect.py

glenn-jocher commented 11 months ago

@Ramanmagar to save crops in '.tif' format instead of '.jpg' in detect.py, you can modify the code that handles the saving of the cropped images. In the YOLOv5 detect.py file, find the section that saves the crop and modify the file extension accordingly. Look for the section that contains the code for saving the crop using the default '.jpg' extension and change it to '.tif'.

For example, if the code looks like this:

cv2.imwrite('crop.jpg', crop)

Change it to:

cv2.imwrite('crop.tif', crop)

This will ensure that the cropped images are saved in '.tif' format.

Remember to test the changes to ensure that the modified code is functioning as expected.