how to use from another "app"

First i want to say thanks for all the great work. I'm just trying to use yolov5 in my own system and I'm having trouble using it. In my code i find i have to write a jpg to use it.

I can't make it work without writing a jpg to the file system and opening it again. and that takes longer than the inference it's slef. Can someone show me or point to me where I can find the answer. Been trying all day and this code works perfectly except the writing and reading part being slow.

Thanks in advance.

`def yolo_detect(source, new_conf_thres, new_iou_thres):

t0 = time.time()

cv2.imwrite('tmp.jpg',source)

dataset = LoadImages('tmp.jpg', img_size=imgsz, stride=stride)`

more context.....

`def yolo_model(new_weights): global model global stride global imgsz global names global device global half global view_img global save_txt global webcam global save_dir global save_img

weights, view_img, save_txt, imgsz = new_weights, opt.view_img, opt.save_txt, opt.img_size

save_img = False    
webcam = False

# Initialize
set_logging()
try:
    device = select_device(opt.device)
except:
    print('device already allocated')
half = device.type != 'cpu'  # half precision only supported on CUDA

# Directories
save_dir = increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)  # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

# Load model
model = attempt_load(new_weights, map_location=device)  # load FP32 model
stride = int(model.stride.max())  # model stride
imgsz = check_img_size(imgsz, s=stride)  # check img_size
names = model.module.names if hasattr(model, 'module') else model.names  # get class names
if half:
    model.half()  # to FP16

def yolo_detect(source, new_conf_thres, new_iou_thres):

t0 = time.time()

cv2.imwrite('tmp.jpg',source)

dataset = LoadImages('tmp.jpg', img_size=imgsz, stride=stride)

r = []

# Run inference
if device.type != 'cpu':
    model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
t1 = time.time()
for path, img, im0s, vid_cap in dataset:
    img = torch.from_numpy(img).to(device)
    img = img.half() if half else img.float()  # uint8 to fp16/32
    img /= 255.0  # 0 - 255 to 0.0 - 1.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)

    # Inference
    t1 = time_synchronized()
    pred = model(img, augment=opt.augment)[0]

    # Apply NMS
    pred = non_max_suppression(pred, new_conf_thres, new_iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
    t2 = time_synchronized()

    # Process detections
    for i, det in enumerate(pred):  # detections per image
        if webcam:  # batch_size >= 1
            p, s, im0, frame = path[i], f'{i}: ', im0s[i].copy(), dataset.count
        else:
            p, s, im0, frame = path, '', im0s.copy(), getattr(dataset, 'frame', 0)

        p = Path(p)  # to Path
        save_path = str(save_dir / p.name)  # img.jpg
        txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
        s += '%gx%g ' % img.shape[2:]  # print string
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        imc = im0.copy() if opt.save_crop else im0  # for opt.save_crop

        if len(det):
            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

            # Print results
            for c in det[:, -1].unique():
                n = (det[:, -1] == c).sum()  # detections per class
                s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

Hi please format your code properly next time, that will make things easier. I guess something like this works for an image. And whether you need cv2.cvtColor(source, cv2.COLOR_BGR2RGB) depends on how you read source. I've simplified the loop a bit to the point where only the bounding box is drawn on im0s. This code is not tested

color = (255, 0, 0)
thickness = 1
t0 = time.time()

r = []

# Run inference
if device.type != 'cpu':
    model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
t1 = time.time()

im0s = cv2.cvtColor(source, cv2.COLOR_BGR2RGB)

img = letterbox(im0s, imgsz, stride=stride)[0]
img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
img = np.ascontiguousarray(img)

img = torch.from_numpy(img).to(device)
img = img.float()
img /= 255.0  # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
    img = img.unsqueeze(0)

# Inference
t1 = time_synchronized()
pred = model(img, augment=opt.augment)[0]

# Apply NMS
pred = non_max_suppression(pred, new_conf_thres, new_iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
t2 = time_synchronized()

# Process detections
for i, det in enumerate(pred):
      if len(det):
          det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
          for *xyxy, conf, cls in reversed(det):
              c1, c2 = (int(xyxy[0]), int(xyxy[1])), (int(xyxy[2]), int(xyxy[3]))
              im0s = cv2.rectangle(im0s, c1, c2, color, thickness)

cv2.imshow("yikes", im0s)
cv2.waitKey(1)

Thanks for the help. I've got it working with minor tweaks. See Below. The problem is that i only gained 25ms on my Nvidia Jetson nano. The first steps (t0), take 90ms and the actual inference takes another 90ms so total time is 180ms or ~5 FPS. I thought that the previous 90ms was writing to the SD card. But obviously there is quite some time in prepping the image. Any tips? is it possible to do away with numpy steps? I assume that is the speed issue. Thanks in advance and I'm sorry for not formatting the code because I couldn't find how.

Also what does this line do "img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416" Because my image size is 640?

def yolo_detect(source, new_conf_thres, new_iou_thres):

    t0 = time.time()

    r = []

    # Run inference
    if device.type != 'cpu':
        model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
    t1 = time.time()

    im0 = source

    img = letterbox_image(im0, imgsz)
    img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
    img = np.ascontiguousarray(img)

    img = torch.from_numpy(img).to(device)
    img = img.half() if half else img.float()  # uint8 to fp16/32
    img /= 255.0  # 0 - 255 to 0.0 - 1.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)

    # Inference
    t1 = time_synchronized()
    pred = model(img, augment=opt.augment)[0]

    # Apply NMS
    pred = non_max_suppression(pred, new_conf_thres, new_iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
    t2 = time_synchronized()
        # Process detections
    for i, det in enumerate(pred):  # detections per image
        gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        imc = im0.copy() if opt.save_crop else im0  # for opt.save_crop

        if len(det):
            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

            # Write results
            for *xyxy, conf, cls in reversed(det):

                r.append({'name':f'{names[int(cls)]}',
                            'conf':f'{conf:.2f}',
                            'x':int(float(f'{xyxy[0]:.2f}')),
                            'y':int(float(f'{xyxy[1]:.2f}')),
                            'w':int(float(f'{xyxy[2]:.2f}') - float(f'{xyxy[0]:.2f}')),
                            'h':int(float(f'{xyxy[3]:.2f}') - float(f'{xyxy[1]:.2f}')),
                            })

    print(f'Done. ({time.time() - t0:.3f}s)')
    print(f'Done. ({time.time() - t1:.3f}s)')

    return r

The comment

3x416x416

was only copy pasted from the detect.py, you can ignore that. img[:, :, ::-1].transpose(2, 0, 1) expects your image to be in BGR format and converts it to RGB. transpose(2, 0, 1) needs to be done since pytorch convolutions expect the channels to be at the first dimension, therefore the image shape is transformed to (channl,width,heigt).

@xrstokes none of the operations you are showing here are required for YOLOv5 inference. I would simply load a YOLOv5 PyTorch Hub models and pass it your image, nothing else required. See PyTorch Hub tutorial to get started.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
img = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(img)
results.print()  # or .show(), .save()

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Tips for Best Training Results ☘️ RECOMMENDED
Weights & Biases Logging 🌟 NEW
Supervisely Ecosystem 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
TorchScript, ONNX, CoreML Export 🚀
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution
Transfer Learning with Frozen Layers ⭐ NEW
TensorRT Deployment

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

ultralytics / yolov5

how to use from another "app" #3225

YOLOv5 Tutorials