ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
28.47k stars 5.66k forks source link

Metrics calculation #16226

Open JoaoLopesFerreira99 opened 6 days ago

JoaoLopesFerreira99 commented 6 days ago

Search before asking

Question

import os
import cv2
import numpy as np
from ultralytics import YOLO

def convert_labels_to_boxes(label_path, image_shape):
    """Converts label format (class, x_center, y_center, width, height) to (x1, y1, x2, y2)."""
    boxes = []
    with open(label_path, 'r') as file:
        lines = file.readlines()
        for line in lines:
            _, x_center, y_center, width, height = map(float, line.strip().split())
            img_width, img_height = image_shape[1], image_shape[0]

            x1 = int((x_center - width / 2) * img_width)
            y1 = int((y_center - height / 2) * img_height)
            x2 = int((x_center + width / 2) * img_width)
            y2 = int((y_center + height / 2) * img_height)

            boxes.append([x1, y1, x2, y2])
    return boxes

def compute_iou(box1, box2):
    """Computes Intersection over Union (IoU) between two bounding boxes."""
    x1, y1, x2, y2 = box1
    x1_, y1_, x2_, y2_ = box2

    xi1 = max(x1, x1_)
    yi1 = max(y1, y1_)
    xi2 = min(x2, x2_)
    yi2 = min(y2, y2_)

    inter_area = max(0, xi2 - xi1) * max(0, yi2 - yi1)
    box1_area = (x2 - x1) * (y2 - y1)
    box2_area = (x2_ - x1_) * (y2_ - y1_)

    union_area = box1_area + box2_area - inter_area
    iou = inter_area / union_area if union_area > 0 else 0
    return iou

def yolo_detection(yolo_model, image):
    """Performs detection on an image using the YOLO model."""
    results = yolo_model(image)
    pred_boxes_final = []
    for result in results:
        for box, conf in zip(result.boxes.xyxy.tolist(), result.boxes.conf.tolist()):
            pred_boxes_final.append([box, conf])
    return pred_boxes_final

def ap_func(pred_boxes, label_boxes, conf, iou_thresholds=np.linspace(0.5, 0.95, 10), eps=1e-16):
    """Computes the average precision for a single class across multiple IoU thresholds."""

    # Sort predictions by confidence
    i = np.argsort(-conf)
    pred_boxes, conf = np.array(pred_boxes)[i], np.array(conf)[i]

    # Initialize arrays for AP and metrics
    ap = np.zeros(len(iou_thresholds))  # Array to store AP for each IoU threshold
    precision = np.zeros(len(iou_thresholds))  # Array to store precision for each IoU threshold
    recall = np.zeros(len(iou_thresholds))  # Array to store recall for each IoU threshold
    tp = np.zeros(len(pred_boxes))  # True positives
    fp = np.zeros(len(pred_boxes))  # False positives
    matched = [False] * len(label_boxes)  # Track matched ground truth boxes

    for j, iou_thres in enumerate(iou_thresholds):
        # Reset TP/FP counters for each IoU threshold
        tp[:] = 0
        fp[:] = 0
        matched[:] = [False] * len(label_boxes)

        for idx, pred_box in enumerate(pred_boxes):
            best_iou = 0
            best_gt_idx = -1

            # Iterate over ground truth boxes to find the best match for the current prediction
            for i, label_box in enumerate(label_boxes):
                iou = compute_iou(pred_box, label_box)
                if iou > best_iou and iou >= iou_thres:  # Only consider IoU >= threshold
                    best_iou = iou
                    best_gt_idx = i

            # Assign TP/FP based on IoU threshold
            if best_iou >= iou_thres and not matched[best_gt_idx]:
                tp[idx] = 1  # True positive
                matched[best_gt_idx] = True
            else:
                fp[idx] = 1  # False positive

        # Compute precision and recall
        tpc = tp.cumsum()
        fpc = fp.cumsum()
        recall[j] = tpc[-1] / (len(label_boxes) + eps)  # Recall = TP / (TP + FN)
        precision[j] = tpc[-1] / (tpc[-1] + fpc[-1] + eps)  # Precision = TP / (TP + FP)

        # Compute AP for the current IoU threshold
        ap[j], _, _ = compute_ap(tpc / (len(label_boxes) + eps), tpc / (tpc + fpc + eps))

    return ap, precision, recall  # Return the mean AP, precision, and recall across all IoU thresholds

def compute_ap(recall, precision):
    """Computes the Average Precision (AP) given recall and precision curves."""
    # Compute the precision envelope
    mpre = np.concatenate(([0.], np.maximum.accumulate(precision), [0.]))
    mrec = np.concatenate(([0.], recall, [1.]))

    # Compute the AP by summing the precision x recall differences
    ap = np.sum((mrec[1:] - mrec[:-1]) * mpre[1:])
    return ap, mpre, mrec

def map50(ap):
    """Returns the mean Average Precision (mAP) at an IoU threshold of 0.5."""
    return ap[0]

def map50_95(ap):
    """Returns the mean Average Precision (mAP) over IoU thresholds from 0.5 to 0.95."""
    return ap.mean()

# YOLO model
model = YOLO('C:/Users/JFerreira/Desktop/DLP/ir.pt')
labels_dir = "C:/Users/JFerreira/Desktop/DLP/test/labels/"
img_dir = "C:/Users/JFerreira/Desktop/DLP/test/ir/"

# Variables for metrics calculation
all_conf = []
all_pb = []
all_gt = []  # Accumulate ground truth boxes

# Iterate through all label files in the test set
for label_file in os.listdir(labels_dir):
    label_path = os.path.join(labels_dir, label_file)
    image_path = os.path.join(img_dir, label_file.replace('.txt', '.png'))

    if os.path.exists(image_path):
        print(f"Processing {label_file}...")

        # Load image
        image = cv2.imread(image_path)

        if image is None:
            print(f"Skipping {label_file}: unable to load image.")
            continue

        # Run YOLO detection
        detections = yolo_detection(model, image)
        ground_truths = convert_labels_to_boxes(label_path, image.shape)

        # Store detection results for per-class precision-recall calculation
        if detections:
            for det in detections:
                pred_box, conf = det
                all_pb.append(pred_box)
                all_conf.append(conf)

        # Accumulate ground truth boxes
        all_gt.extend(ground_truths)

# Convert lists to numpy arrays for metric calculation
all_pb = np.array(all_pb)
all_conf = np.array(all_conf)

# Compute AP, precision, and recall across multiple IoU thresholds
ap, precision, recall = ap_func(all_pb, all_gt, all_conf, iou_thresholds=np.linspace(0.5, 0.95, 10))

# Compute mAP50 and mAP50-95
map_50 = map50(ap)
map_50_95 = map50_95(ap)

print(f"Precision: {precision[0]:.3f}")
print(f"Recall: {recall[0]:.3f}")
print(f"F1: {2*(precision[0]*recall[0])/(precision[0]+recall[0]):.3f}")
print(f"mAP50: {map_50:.3f}")
print(f"mAP50-95: {map_50_95:.3f}")
print(f"AP: {ap}")

I wrote this script based on the metrics.py script to try to compute the metrics manually (I'll need it when I merge two different YOLO models in a decision level approach) However, the results I'm getting don't match the ones from the .val method. Can anyone detect what I am doing wrong?

The results I get are

Precision: 0.888
Recall: 0.851
F1: 0.870
mAP50: 0.851
mAP50-95: 0.647

They should be

Precision: 0.933
Recall: 0.864
F1: 0.897
mAP50: 0.916
mAP50-95: 0.698

Thanks in advance.

Additional

No response

Skillnoob commented 6 days ago

The val mode uses different conf and iou values than prediction by default. You need to add conf=0.001, iou=0.6 to your prediction code.

JoaoLopesFerreira99 commented 6 days ago

so I should set model.conf=0.001, right? where should I change the IoU then?

JoaoLopesFerreira99 commented 6 days ago

By the way, is there any other method than .val() o retrieve metrics of a dataset?

JoaoLopesFerreira99 commented 6 days ago

ok, just set model.conf=0.001 and model.iou = 0.6 but got the same metrics

Skillnoob commented 6 days ago

no, that is not how you do it. You need to pass these arguments to the prediction call. As in results = model(..., conf=0.001, iou=0.6)

Skillnoob commented 6 days ago

By the way, is there any other method than .val() o retrieve metrics of a dataset?

No

JoaoLopesFerreira99 commented 6 days ago

Precision: 0.176 Recall: 0.938 F1: 0.297 mAP50: 0.938 mAP50-95: 0.680 Okay, the metrics changed but are not good either...

glenn-jocher commented 5 days ago

It seems like the precision is low. You might want to check your confidence threshold and ensure your dataset annotations are accurate. Adjusting these can help improve precision.

JoaoLopesFerreira99 commented 4 days ago

Yeah, I understand that. I only set the confidence threshold to 0.001 because it was what Skillnoob suggested. However, I still do not understand what I am missing in my script.

Skillnoob commented 4 days ago

You can look at how ultralytics calculates the mAP etc. https://github.com/ultralytics/ultralytics/blob/c2068df9d981c5ae27fedee550bdaedddcec3c53/ultralytics/engine/validator.py#L39 & https://github.com/ultralytics/ultralytics/blob/c2068df9d981c5ae27fedee550bdaedddcec3c53/ultralytics/models/yolo/detect/val.py#L17