mAP code not working when used separately

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation

Bug

My problem statement is given an image, ground truth bounding boxes and predicted bounding boxes generate a mAP score e.g.

Model summary: 290 layers, 20869098 parameters, 0 gradients
               Class     Images     Labels          P          R     mAP@.5   mAP@.5:.95
                 all        125        227       0.67      0.601       0.66      0.468
           Ambulance        125         32      0.748      0.781       0.85      0.691
                 Bus        125         23      0.655      0.696      0.654       0.48
                 Car        125        119      0.651      0.579      0.606      0.419
          Motorcycle        125         23      0.714      0.651       0.78      0.436
               Truck        125         30      0.583        0.3      0.409      0.316

So I tried replicating the function run() present in the val.py in a separate notebook https://github.com/ultralytics/yolov5/blob/c3e4e94e944de3b41b3398e2f78e596384739339/val.py#L99

I'll be adding the screenshots of my notebook with some outputs:

Required imports and loading the pretrained model
Creating a dataloader, where mentioned directory contains images/ and labels/
Running NMS on a single image (since kept batch_size=1 just to test)
Calculating IOU over all predictions
model predictions are not matching any labels, even though it works when inferred separately (similar results for other images as well)
0 mAP

I ran the same code using python val.py ... and I got a mAP score of about 0.71 So my question is, am I missing anything in the code while running it separately in the notebook?

Environment

YOLOv5 🚀 2023-4-12 Python-3.8.10 torch-1.8.1+cu102 CPU
OS: Ubuntu 20.04
Python: 3.8

Minimal Reproducible Example

from utils.dataloaders import create_dataloader
from utils.metrics import ConfusionMatrix, ap_per_class, box_iou
from utils.general import (LOGGER, TQDM_BAR_FORMAT, Profile, check_dataset, check_img_size, check_requirements,
                           check_yaml, coco80_to_coco91_class, colorstr, increment_path, non_max_suppression,
                           print_args, scale_boxes, xywh2xyxy, xyxy2xywh)

device = "cuda" if torch.cuda.is_available() else "cpu"
nc = 33  # number of classes
iouv = torch.linspace(0.5, 0.95, 10, device=device)  # iou vector for mAP@0.5:0.95
niou = iouv.numel()

model = torch.hub.load('.', 'custom', path='weights/weights.pt', source='local')  # local repo

stride, pt = model.stride, model.pt
batch_size = 1
imgsz = 1280
pad, rect = (0.5, pt) 

imgsz = check_img_size(imgsz, s=stride) 

dataloader = create_dataloader("../data-prep/dataset_2970_split/test/",
                                       imgsz,
                                       batch_size,
                                       stride,
                                       single_cls=False,
                                       pad=pad,
                                       rect=rect,
                                       workers=8,
                                       prefix=colorstr(f'test : '))[0]

seen = 0
is_coco = False
names = model.names if hasattr(model, 'names') else model.module.names
if isinstance(names, (list, tuple)):  # old format
        names = dict(enumerate(names))
tp, fp, p, r, f1, mp, mr, map50, ap50, map = 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

jdict, stats, ap, ap_class = [], [], [], []
conf_thres = 0.6
iou_thres = 0.6
save_hybrid = False

s = ('%22s' + '%11s' * 6) % ('Class', 'Images', 'Instances', 'P', 'R', 'mAP50', 'mAP50-95')
pbar = tqdm(dataloader, desc=s, bar_format=TQDM_BAR_FORMAT)  # progress bar
for batch_i, (im, targets, paths, shapes) in enumerate(pbar):
    nb, _, height, width = im.shape
    preds, train_out = model(im), None

    targets[:, 2:] *= torch.tensor((width, height, width, height), device=device)
    lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling

    preds = non_max_suppression(preds,
                                conf_thres,
                                iou_thres,
                                labels=lb,
                                multi_label=True,
                                agnostic=None,
                                max_det=300)

    break

def process_batch(detections, labels, iouv):
    """
    Return correct prediction matrix
    Arguments:
        detections (array[N, 6]), x1, y1, x2, y2, conf, class
        labels (array[M, 5]), class, x1, y1, x2, y2
    Returns:
        correct (array[N, 10]), for 10 IoU levels
    """
    correct = np.zeros((detections.shape[0], iouv.shape[0])).astype(bool)
    iou = box_iou(labels[:, 1:], detections[:, :4])
    correct_class = labels[:, 0:1] == detections[:, 5]
    for i in range(len(iouv)):
        x = torch.where((iou >= iouv[i]) & correct_class)  # IoU > threshold and classes match
        if x[0].shape[0]:
            matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detect, iou]
            if x[0].shape[0] > 1:
                matches = matches[matches[:, 2].argsort()[::-1]]
                matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
                # matches = matches[matches[:, 2].argsort()[::-1]]
                matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
            correct[matches[:, 1].astype(int), i] = True
    return torch.tensor(correct, dtype=torch.bool, device=iouv.device)

for si, pred in enumerate(preds):
    labels = targets[targets[:, 0] == si, 1:]
    nl, npr = labels.shape[0], pred.shape[0]  # number of labels, predictions
    path, shape = Path(paths[si]), shapes[si][0]
    correct = torch.zeros(npr, niou, dtype=torch.bool, device=device)  # init
    seen += 1

    if npr == 0:
        if nl:
            stats.append((correct, *torch.zeros((2, 0), device=device), labels[:, 0]))
        continue

    # Predictions
    predn = pred.clone()
    scale_boxes(im[si].shape[1:], predn[:, :4], shape, shapes[si][1])  # native-space pred

    # Evaluate
    if nl:
        tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
        scale_boxes(im[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels
        labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
        correct = process_batch(predn, labelsn, iouv)
    stats.append((correct, pred[:, 4], pred[:, 5], labels[:, 0]))  # (correct, conf, pcls, tcls)

stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]
if len(stats) and stats[0].any():
    tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, names=names)
    ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
    mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
nt = np.bincount(stats[3].astype(int), minlength=nc)  # number of targets per class

print(map)

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

@pathikg hello,

Thanks for providing the detailed replication steps and code.

I believe the issue you are encountering in the notebook setup may be related to the difference in version between the model used in val.py and the model used in your notebook. Additionally, since the confidence threshold and IOU threshold can also have an impact on the results, please ensure that you are using the same values for these parameters as in val.py.

Moreover, as you seem to be using a specific dataset, I would advise you to double-check if the paths to your files are correctly set.

I hope this helps. Please let me know if you have any further questions or issues.

Best regards!

Thanks for responding @glenn-jocher
I double checked everything and everything seems same as val.py

though, there's a once step which I did not replicate and it was the following: https://github.com/ultralytics/yolov5/blob/c3e4e94e944de3b41b3398e2f78e596384739339/val.py#L143

I did it as follows: model = DetectMultiBackend(weights_path, device=device, dnn=False, data="dataset_7412/data.yaml", fp16=False)

So I repeated all the steps as earlier, and while doing the inference I faced this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[13], line 6
      4 nb, _, height, width = im.shape
      5 im = im.to(device)
----> 6 preds, train_out = model(im), None
      8 targets[:, 2:] *= torch.tensor((width, height, width, height), device=device)
      9 lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /mnt/batch/tasks/shared/LS_root/mounts/clusters/pibit-ml-cpu/code/health-claim/yolov5/models/common.py:514, in DetectMultiBackend.forward(self, im, augment, visualize)
    511     im = im.permute(0, 2, 3, 1)  # torch BCHW to numpy BHWC shape(1,320,192,3)
    513 if self.pt:  # PyTorch
--> 514     y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)
    515 elif self.jit:  # TorchScript
    516     y = self.model(im)

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /mnt/batch/tasks/shared/LS_root/mounts/clusters/pibit-ml-cpu/code/health-claim/yolov5/models/yolo.py:209, in DetectionModel.forward(self, x, augment, profile, visualize)
    207 if augment:
    208     return self._forward_augment(x)  # augmented inference, None
--> 209 return self._forward_once(x, profile, visualize)

File /mnt/batch/tasks/shared/LS_root/mounts/clusters/pibit-ml-cpu/code/health-claim/yolov5/models/yolo.py:121, in BaseModel._forward_once(self, x, profile, visualize)
    119 if profile:
    120     self._profile_one_layer(m, x, dt)
--> 121 x = m(x)  # run
    122 y.append(x if m.i in self.save else None)  # save output
    123 if visualize:

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /mnt/batch/tasks/shared/LS_root/mounts/clusters/pibit-ml-cpu/code/health-claim/yolov5/models/common.py:59, in Conv.forward_fuse(self, x)
     58 def forward_fuse(self, x):
---> 59     return self.act(self.conv(x))

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:889, in Module._call_impl(self, *input, **kwargs)
    887     result = self._slow_forward(*input, **kwargs)
    888 else:
--> 889     result = self.forward(*input, **kwargs)
    890 for hook in itertools.chain(
    891         _global_forward_hooks.values(),
    892         self._forward_hooks.values()):
    893     hook_result = hook(self, input, result)

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/conv.py:399, in Conv2d.forward(self, input)
    398 def forward(self, input: Tensor) -> Tensor:
--> 399     return self._conv_forward(input, self.weight, self.bias)

File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/conv.py:395, in Conv2d._conv_forward(self, input, weight, bias)
    391 if self.padding_mode != 'zeros':
    392     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    393                     weight, bias, self.stride,
    394                     _pair(0), self.dilation, self.groups)
--> 395 return F.conv2d(input, weight, bias, self.stride,
    396                 self.padding, self.dilation, self.groups)

RuntimeError: expected scalar type Byte but found Float

Do you know what can be done?

Okay the issue was solved after doing im = im.float().to(device)

but still the predictions are not correct e.g. when I load the torch.hub model I get following predictions:

but when I am loading the model using snippets present in val.py on the same image, I get the following:

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@pathikg

Glad to hear that you were able to resolve the initial issue regarding the expected scalar type. Regarding the discrepancy in predictions between the torch.hub model and the custom model loading using the val.py setup, I would suggest checking the following:

Ensure that the models are loaded with the same configuration and weights.
Validate that the pre-processing steps are consistent between the two methods, including input image normalization, resizing, and format.
Verify that the post-processing steps such as non-maximum suppression (NMS) and confidence thresholding are consistent across the models.
Double-check if the model input data (inference image) is the same for both methods.

If these aspects are aligned and you are still receiving inconsistent predictions, you may consider comparing the model structures and configurations between the torch.hub model and the custom model loaded with the val.py setup to ensure that they are indeed the same.

I hope these suggestions help. Let me know if you have further questions or require additional assistance.

ultralytics / yolov5