Poor bounding box localization for objects

purvang3 commented 2 years ago

While analyzing validation images and also converting trained model to onnx and visualizing predictions, I see bounding boxes are off by quit margin from ground truth boxes. During training, loss landscape and mAP metric appears to be behaving as expected with following coco metric result.

below is code snippet, I am using for predictions.

""" self.model = converted onnx model.

    resized = letterbox(img_np).    # HWC
    img_in = resized[0].astype(np.float32) / 255.0
    img_in = np.expand_dims(np.moveaxis(img_in, -1, 0), axis=0)  # HWC -> CHW 
    input_name = self.model.get_inputs()[0].name
    out = self.model.run(None, {input_name: img_in})
    out = non_max_suppression(torch.from_numpy(out[0]), 0.10, 0.60)

   #. img_in : CHW
   # img_np : HWC

    scale_coords(img_in.shape[2:], out[0][:4], img_np.shape[:2])
    boxes = xyxy2xywh(out[0][:4])
    boxes[:, :2] -= boxes[:, 2:] / 2
    out_list = out[0].tolist()

    image_src = Image.fromarray(img_np)

    draw = ImageDraw.Draw(image_src)
    for i, out in enumerate(zip(boxes.tolist(), out_list)):
        shape, cls_list = out[0][:4], out[0][4:]
        draw.rectangle([(shape[0], shape[1]), (shape[2]+shape[0], shape[3]+shape[1])], outline="red")
        draw.text(xy=[shape[0], shape[1] - 3],
                  text=f"{str(int(cls_list[1])), cls_list[0]}",
                  font=font)

"""

Is there any problem with visualization? What are the recommendations to improve localization? Let me know if need more information.

Thank you

glenn-jocher commented 2 years ago

@purvang3 👋 Hello! Thanks for asking about handling inference results. YOLOv5 🚀 PyTorch Hub models allow for simple model loading and inference in a pure python environment without using detect.py.

Simple Inference Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the YOLOv5 'small' model. For details on all available models please see the README. Custom models can also be loaded, including custom trained PyTorch models and their exported variants, i.e. ONNX, TensorRT, TensorFlow, OpenVINO YOLOv5 models.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, etc.
# model = torch.hub.load('ultralytics/yolov5', 'custom', 'path/to/best.pt')  # custom trained model

# Images
im = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, URL, PIL, OpenCV, numpy, list

# Inference
results = model(im)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

results.xyxy[0]  # im predictions (tensor)
results.pandas().xyxy[0]  # im predictions (pandas)
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

See YOLOv5 PyTorch Hub Tutorial for details.

Good luck 🍀 and let us know if you have any other questions!

purvang3 commented 2 years ago

@glenn-jocher Thank you for your reply. Seems that input format was the problem for bad localization. Providing annotations in CxCywh format solves the issue. Could you mention the reason to go with CxCyWH based annotation format rather XYWH format?

yangrisheng commented 2 years ago

@glenn-jocher谢谢你的回复。似乎输入格式是错误本地化的问题。提供 CxCywh 格式的注释可以解决这个问题。您能否提及使用基于 CxCyWH 的注释格式而不是 XYWH 格式的原因？

I think it is a characteristic on yolo series.yolo uses CxCywh all.

glenn-jocher commented 2 years ago

@purvang3 @yangrisheng 👋 Hello! Thanks for asking about YOLOv5 🚀 dataset formatting. To train correctly your data must be in YOLOv5 format. Please see our Train Custom Data tutorial for full documentation on dataset setup and all steps required to start training your first model. A few excerpts from the tutorial:

1.1 Create dataset.yaml

COCO128 is an example small tutorial dataset composed of the first 128 images in COCO train2017. These same 128 images are used for both training and validation to verify our training pipeline is capable of overfitting. data/coco128.yaml, shown below, is the dataset config file that defines 1) the dataset root directory path and relative paths to train / val / test image directories (or *.txt files with image paths), 2) the number of classes nc and 3) a list of class names:

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
nc: 80  # number of classes
names: [ 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush' ]  # class names

1.2 Create Labels

After using a tool like Roboflow Annotate to label your images, export your labels to YOLO format, with one *.txt file per image (if no objects in image, no *.txt file is required). The *.txt file specifications are:

One row per object
Each row is class x_center y_center width height format.
Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by image width, and y_center and height by image height.
Class numbers are zero-indexed (start from 0).

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

1.3 Organize Directories

Organize your train and val images and labels according to the example below. YOLOv5 assumes /coco128 is inside a /datasets directory next to the /yolov5 directory. YOLOv5 locates labels automatically for each image by replacing the last instance of /images/ in each image path with /labels/. For example:

../datasets/coco128/images/im0.jpg  # image
../datasets/coco128/labels/im0.txt  # label

Good luck 🍀 and let us know if you have any other questions!

ultralytics / yolov5