ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
51.08k stars 16.42k forks source link

How to write the detected number of people in the video? #242

Closed callme79 closed 4 years ago

callme79 commented 4 years ago

I'm new to coding. During the running test, there are some objects is detected. I want to show the detected results in the video like counting number of people and what should I do? I don't know what are the variable is used. Capture2

github-actions[bot] commented 4 years ago

Hello @callme79, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

For more information please visit https://www.ultralytics.com.

NanoCode012 commented 4 years ago

At this line, you can find the class and the bounding boxes coordinates. https://github.com/ultralytics/yolov5/blob/0a08375a8a1183e0b1de49d21227ed085f0bd961/detect.py#L97

If you just want to count the num of classes per frame, this line would do as well. https://github.com/ultralytics/yolov5/blob/0a08375a8a1183e0b1de49d21227ed085f0bd961/detect.py#L93

callme79 commented 4 years ago

Hi, how do I show how many people there are in the video? I'm quite blur on this.

NanoCode012 commented 4 years ago

It's up to you how you want to show. You can output to a txt file or you can write to the image via opencv.

If you wish to do the latter, do it above this line. https://github.com/ultralytics/yolov5/blob/0a08375a8a1183e0b1de49d21227ed085f0bd961/detect.py#L130

diego0718 commented 4 years ago

Hi @NanoCode012, i need to track my single-class object. To do this,i need to get the bboxes of the detected objects. i have thought i could find below line 95 of detect.py in section "write results" "xyxy" variable but i am not sure.. Could you explain me please?

NanoCode012 commented 4 years ago

Sure. Look at this line, https://github.com/ultralytics/yolov5/blob/0a08375a8a1183e0b1de49d21227ed085f0bd961/detect.py#L101

It describes the x,y, width, height and class of each box. You can write code to check if cls==person_class_number, then increase a counter. Then you can write it to a file(txt, json) up to you, or you can write to the frame via opencv.

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

swethabethireddy commented 4 years ago

how to get the count of objects in a video ?

glenn-jocher commented 4 years ago

@swethabethireddy see python code in detect.py:

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += '%g %ss, ' % (n, names[int(c)])  # add to string
xHUGx commented 3 years ago

And so what, "n" is counting, but how to show it in detected video?

glenn-jocher commented 3 years ago

@xHUGx you would customize detect.py to suit your requirements I assume.

SpongeBab commented 3 years ago

This is simple.Here's what I did. Note: My custom dataset is single class, I only test it on my dataset. If the number of class is not 1. There should be an error.

                count = 0
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or opt.save_crop or view_img:  # Add bbox to image
                        c = int(cls)  # integer class
                        # label = None if opt.hide_labels else (names[c] if opt.hide_conf else f'{names[c]} {conf:.2f}')
                        label = str(count)
                        # plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=opt.line_thickness)
                        plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=1)
                        count += 1
                        if opt.save_crop:
                            save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

And then run 'detect.py': image

PraveenGit3 commented 3 years ago

Wow Excellent its working ....But problem its giving only id for each bounding box I need to give class name too like 1 Book. 2 Person like this

SureshbabuAkash1999 commented 3 years ago

This is simple.Here's what I did. Note: My custom dataset is single class, I only test it on my dataset. If the number of class is not 1. There should be an error.

                count = 0
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or opt.save_crop or view_img:  # Add bbox to image
                        c = int(cls)  # integer class
                        # label = None if opt.hide_labels else (names[c] if opt.hide_conf else f'{names[c]} {conf:.2f}')
                        label = str(count)
                        # plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=opt.line_thickness)
                        plot_one_box(xyxy, im0, label=label, color=colors(c, True), line_thickness=1)
                        count += 1
                        if opt.save_crop:
                            save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

And then run 'detect.py': image

Where should I attach this file in detect.py?

SpongeBab commented 3 years ago

here https://github.com/ultralytics/yolov5/blob/master/detect.py#L196

Uvindu98 commented 3 years ago

In my case, I want to detect people and if people count is more than 1, a warning must be displayed on the video how can I do it ?

SureshbabuAkash1999 commented 3 years ago

Okay, to display the number of people counted on the image, copy the below code and paste it in detect.py.

"""

YOLOv5 πŸš€ by Ultralytics, GPL-3.0 license

""" Run inference on images, videos, directories, streams, etc.

Usage: $ python path/to/detect.py --source path/to/img.jpg --weights yolov5s.pt --img 640 """

import argparse import sys import time from pathlib import Path

import cv2 import numpy as np import torch import torch.backends.cudnn as cudnn

FILE = Path(file).absolute() sys.path.append(FILE.parents[0].as_posix()) # add yolov5/ to path

from models.experimental import attempt_load from utils.datasets import LoadStreams, LoadImages from utils.general import check_img_size, check_requirements, check_imshow, colorstr, is_ascii, non_max_suppression, \ apply_classifier, scale_coords, xyxy2xywh, strip_optimizer, set_logging, increment_path, save_one_box from utils.plots import Annotator, colors from utils.torch_utils import select_device, load_classifier, time_sync

@torch.no_grad() def run(weights='yolov5s.pt', # model.pt path(s) source='data/images', # file/dir/URL/glob, 0 for webcam imgsz=640, # inference size (pixels) conf_thres=0.25, # confidence threshold iou_thres=0.45, # NMS IOU threshold max_det=1000, # maximum detections per image device='', # cuda device, i.e. 0 or 0,1,2,3 or cpu view_img=False, # show results save_txt=False, # save results to *.txt save_conf=False, # save confidences in --save-txt labels save_crop=False, # save cropped prediction boxes nosave=False, # do not save images/videos classes=None, # filter by class: --class 0, or --class 0 2 3 agnostic_nms=False, # class-agnostic NMS augment=False, # augmented inference visualize=False, # visualize features update=False, # update all models project='runs/detect', # save results to project/name name='exp', # save results to project/name exist_ok=False, # existing project/name ok, do not increment line_thickness=3, # bounding box thickness (pixels) hide_labels=False, # hide labels hide_conf=False, # hide confidences half=False, # use FP16 half-precision inference ): save_img = not nosave and not source.endswith('.txt') # save inference images webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith( ('rtsp://', 'rtmp://', 'http://', 'https://'))

# Directories
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

# Initialize
set_logging()
device = select_device(device)
half &= device.type != 'cpu'  # half precision only supported on CUDA

# Load model
w = weights[0] if isinstance(weights, list) else weights
classify, suffix = False, Path(w).suffix.lower()
pt, onnx, tflite, pb, saved_model = (suffix == x for x in ['.pt', '.onnx', '.tflite', '.pb', ''])  # backend
stride, names = 64, [f'class{i}' for i in range(1000)]  # assign defaults
if pt:
    model = attempt_load(weights, map_location=device)  # load FP32 model
    stride = int(model.stride.max())  # model stride
    names = model.module.names if hasattr(model, 'module') else model.names  # get class names
    if half:
        model.half()  # to FP16
    if classify:  # second-stage classifier
        modelc = load_classifier(name='resnet50', n=2)  # initialize
        modelc.load_state_dict(torch.load('resnet50.pt', map_location=device)['model']).to(device).eval()
elif onnx:
    check_requirements(('onnx', 'onnxruntime'))
    import onnxruntime
    session = onnxruntime.InferenceSession(w, None)
else:  # TensorFlow models
    check_requirements(('tensorflow>=2.4.1',))
    import tensorflow as tf
    if pb:  # https://www.tensorflow.org/guide/migrate#a_graphpb_or_graphpbtxt
        def wrap_frozen_graph(gd, inputs, outputs):
            x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=""), [])  # wrapped import
            return x.prune(tf.nest.map_structure(x.graph.as_graph_element, inputs),
                           tf.nest.map_structure(x.graph.as_graph_element, outputs))

        graph_def = tf.Graph().as_graph_def()
        graph_def.ParseFromString(open(w, 'rb').read())
        frozen_func = wrap_frozen_graph(gd=graph_def, inputs="x:0", outputs="Identity:0")
    elif saved_model:
        model = tf.keras.models.load_model(w)
    elif tflite:
        interpreter = tf.lite.Interpreter(model_path=w)  # load TFLite model
        interpreter.allocate_tensors()  # allocate
        input_details = interpreter.get_input_details()  # inputs
        output_details = interpreter.get_output_details()  # outputs
        int8 = input_details[0]['dtype'] == np.uint8  # is TFLite quantized uint8 model
imgsz = check_img_size(imgsz, s=stride)  # check image size
ascii = is_ascii(names)  # names are ascii (use PIL for UTF-8)

# Dataloader
if webcam:
    view_img = check_imshow()
    cudnn.benchmark = True  # set True to speed up constant image size inference
    dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt)
    bs = len(dataset)  # batch_size
else:
    dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt)
    bs = 1  # batch_size
vid_path, vid_writer = [None] * bs, [None] * bs

# Run inference
if pt and device.type != 'cpu':
    model(torch.zeros(1, 3, *imgsz).to(device).type_as(next(model.parameters())))  # run once
t0 = time.time()
for path, img, im0s, vid_cap in dataset:
    if onnx:
        img = img.astype('float32')
    else:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
    img = img / 255.0  # 0 - 255 to 0.0 - 1.0
    if len(img.shape) == 3:
        img = img[None]  # expand for batch dim

    # Inference
    t1 = time_sync()
    if pt:
        visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
        pred = model(img, augment=augment, visualize=visualize)[0]
    elif onnx:
        pred = torch.tensor(session.run([session.get_outputs()[0].name], {session.get_inputs()[0].name: img}))
    else:  # tensorflow model (tflite, pb, saved_model)
        imn = img.permute(0, 2, 3, 1).cpu().numpy()  # image in numpy
        if pb:
            pred = frozen_func(x=tf.constant(imn)).numpy()
        elif saved_model:
            pred = model(imn, training=False).numpy()
        elif tflite:
            if int8:
                scale, zero_point = input_details[0]['quantization']
                imn = (imn / scale + zero_point).astype(np.uint8)  # de-scale
            interpreter.set_tensor(input_details[0]['index'], imn)
            interpreter.invoke()
            pred = interpreter.get_tensor(output_details[0]['index'])
            if int8:
                scale, zero_point = output_details[0]['quantization']
                pred = (pred.astype(np.float32) - zero_point) * scale  # re-scale
        pred[..., 0] *= imgsz[1]  # x
        pred[..., 1] *= imgsz[0]  # y
        pred[..., 2] *= imgsz[1]  # w
        pred[..., 3] *= imgsz[0]  # h
        pred = torch.tensor(pred)

    # NMS
    pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
    t2 = time_sync()

    # Second-stage classifier (optional)
    if classify:
        pred = apply_classifier(pred, modelc, img, im0s)

    # Process predictions
    for i, det in enumerate(pred):  # detections per image
        if webcam:  # batch_size >= 1
            p, s, im0, frame = path[i], f'{i}: ', im0s[i].copy(), dataset.count
        else:
            p, s, im0, frame = path, '', im0s.copy(), getattr(dataset, 'frame', 0)

        p = Path(p)  # to Path
        save_path = str(save_dir / p.name)  # img.jpg
        txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
        #s += '%gx%g ' % img.shape[2:]  # print string
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        imc = im0.copy() if save_crop else im0  # for save_crop
        annotator = Annotator(im0, line_width=line_thickness, pil=not ascii)
        if len(det):
            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

            # Print results
            for c in det[:, -1].unique():
                n = (det[:, -1]== c).sum()  # detections per class
                global akash
                akash= n
                s= f"{akash}"  # add to string
                print(s)
                cv2.putText(im0, "People" + str(s), (20, 50), 0, 2, (100, 200, 0), 3)# extra added
            # Write results
            for *xyxy, conf, cls in reversed(det):
                if save_txt:  # Write to file
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                    with open(txt_path + '.txt', 'a') as f:
                        f.write(('%g ' * len(line)).rstrip() % line + '\n')

                if save_img or save_crop or view_img:  # Add bbox to image
                    c = int(cls)  # integer class
                    label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                    annotator.box_label(xyxy, label, color=colors(c, True))
                    if save_crop:
                        save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)

        # Print time (inference + NMS)
        #print(f'{s}Done. ({t2 - t1:.3f}s)')

        # Stream results
        im0 = annotator.result()
        if view_img:
            cv2.imshow(str(p), im0)
            cv2.waitKey(1)  # 1 millisecond

        # Save results (image with detections)
        if save_img:
            if dataset.mode == 'image':
                cv2.imwrite(save_path, im0)
            else:  # 'video' or 'stream'
                if vid_path[i] != save_path:  # new video
                    vid_path[i] = save_path
                    if isinstance(vid_writer[i], cv2.VideoWriter):
                        vid_writer[i].release()  # release previous video writer
                    if vid_cap:  # video
                        fps = vid_cap.get(cv2.CAP_PROP_FPS)
                        w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                        h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                    else:  # stream
                        fps, w, h = 30, im0.shape[1], im0.shape[0]
                        save_path += '.mp4'
                    vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
                vid_writer[i].write(im0)

if save_txt or save_img:
    s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
    print(f"Results saved to {colorstr('bold', save_dir)}{s}")

if update:
    strip_optimizer(weights)  # update model (to fix SourceChangeWarning)

print(f'Done. ({time.time() - t0:.3f}s)')

def parse_opt(): parser = argparse.ArgumentParser() parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)') parser.add_argument('--source', type=str, default='data/images', help='file/dir/URL/glob, 0 for webcam') parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w') parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold') parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold') parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image') parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--view-img', action='store_true', help='show results') parser.add_argument('--save-txt', action='store_true', help='save results to .txt') parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels') parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes') parser.add_argument('--nosave', action='store_true', help='do not save images/videos') parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3') parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS') parser.add_argument('--augment', action='store_true', help='augmented inference') parser.add_argument('--visualize', action='store_true', help='visualize features') parser.add_argument('--update', action='store_true', help='update all models') parser.add_argument('--project', default='runs/detect', help='save results to project/name') parser.add_argument('--name', default='exp', help='save results to project/name') parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment') parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)') parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels') parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences') parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference') opt = parser.parse_args() opt.imgsz = 2 if len(opt.imgsz) == 1 else 1 # expand return opt

def main(opt): print(colorstr('detect: ') + ', '.join(f'{k}={v}' for k, v in vars(opt).items())) check_requirements(exclude=('tensorboard', 'thop')) run(**vars(opt))

if name == "main": opt = parse_opt() main(opt) """

SureshbabuAkash1999 commented 3 years ago

Here the line with the "extra added" comment is used to display the count in the image. Tweak it according to your need.

SpongeBab commented 3 years ago

Wow Excellent its working ....But problem its giving only id for each bounding box I need to give class name too like 1 Book. 2 Person like this

Oh, this need more work...you can try it.

yh-99 commented 2 years ago

good job thanks!

ding-yu-chuan commented 2 years ago

Hi, how to count the total numbers of objects and display in your pictures that you detect ? I'm a little blur about it.

makraimit commented 2 years ago

How to generate some metric like total count and total second a object was visible in the video.

glenn-jocher commented 2 years ago

@makraimit πŸ‘‹ Hello! Thanks for asking about handling inference results. YOLOv5 πŸš€ PyTorch Hub models allow for simple model loading and inference in a pure python environment without using detect.py.

Simple Inference Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the YOLOv5 'small' model. For details on all available models please see the README. Custom models can also be loaded, including custom trained PyTorch models and their exported variants, i.e. ONNX, TensorRT, TensorFlow, OpenVINO YOLOv5 models.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # yolov5n - yolov5x6 official model
#                                            'custom', 'path/to/best.pt')  # custom model

# Images
im = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, URL, PIL, OpenCV, numpy, list

# Inference
results = model(im)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.
results.xyxy[0]  # im predictions (tensor)

results.pandas().xyxy[0]  # im predictions (pandas)
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

results.pandas().xyxy[0].value_counts('name')  # class counts (pandas)
# person    2
# tie       1

See YOLOv5 PyTorch Hub Tutorial for details.

Good luck πŸ€ and let us know if you have any other questions!

makraimit commented 2 years ago

Thanks @glenn-jocher for the quick reply. Is it possible to get what portion of frame is occupied by the particular object in each frame and averaging of the same over video?

glenn-jocher commented 1 year ago

@makraimit yes, it is possible to determine the portion of the frame occupied by a specific object in each frame using YOLOv5. The 'torchscript' submodule can be used to export the model to TorchScript, which allows for easy integration with non-Python code. This provides a simple and efficient way to deploy models in C++, Rust, Node.js, and other languages, as well as on mobile devices with support for PyTorch's mobile interpreter.

For example, to measure the portion of the frame occupied by a particular object in each frame using TorchScript, follow this code snippet:

import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', path='model_best.pt')  # replace 'custom' with your model name or 'yolov5s', 'yolov5m', 'yolov5l', 'yolov5x'
model = model.autoshape()  # for changes, consistent results post-torchscript conversion

# Example
sample_img = 'https://ultralytics.com/images/bus.jpg'  # adapted from https://pytorch.org/hub/ultralytics_yolov5/
img = torch.tensor([model.preprocess(sample_img)])  # 1 image Tensor from using preprocess() as a singleton list
jit_model = torch.jit.trace(model, img, strict=False)

# Mobile device support
jit_model = optimize_for_mobile(jit_model)
jit_model.save("yolov5s_bus.pt")  # Save the traced model

The saved TorchScript model 'yolov5s_bus.pt' captures the specific object portion details over the video. By leveraging the TorchScript model, the average portion occupied by the object can be determined over the entire video.

Feel free to explore the YOLOv5 PyTorch Hub Tutorial to better understand the process.

Should you have further questions or need additional assistance, don't hesitate to ask!