ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.29k stars 16.24k forks source link

Questions about mAP of inference #5116

Closed Zengyf-CVer closed 3 years ago

Zengyf-CVer commented 3 years ago

@glenn-jocher Hello, I have researched the mAP in the inference part and found some problems:

glenn-jocher commented 3 years ago

@Zengyf-CVer repo mAP aligns with 3rd party tools like pycocotools to within about 1%.

I don't understand the rest of your question, but detections and labels are normalized so it will not matter the size of the original image, they apply to any size image.

Zengyf-CVer commented 3 years ago

@glenn-jocher I did this experiment on a custom data set, as shown in the figure: ksnip_20211011-121334

I found that there is a certain gap between the results of repo and third-party pycocotools: AP difference is 5.6%, AP50 difference is 0.7%. AP.5:.95 There is too much difference, the difference is 5.6%.

glenn-jocher commented 3 years ago

@Zengyf-CVer ah I see. That's interesting. We don't have a solution for this pycocotools difference. In fact we have an open competition with a 1000 EUR reward to close it, see https://github.com/ultralytics/yolov5/issues/2258

Screen Shot 2021-02-20 at 3 11 47 PM
Zengyf-CVer commented 3 years ago

@glenn-jocher Which one is more correct? In other words, which one is closer to the real AP? repo or pycocotools?

Zengyf-CVer commented 3 years ago

@glenn-jocher Although I don't know how to solve this gap, I found an interesting bug in my research: I exported the AP data in your project and found that it is a (nc×10) matrix, where nc is the number of categories. The sum of all the elements in the first column is AP50, the second column is AP55, and so on, the last column is AP95. Therefore, I compared the 10 indicators in the repo with the 10 indicators in pycocotools and found a problem: AP95 in the repo is obviously much larger than the AP95 in pycocotools. This is also the value of AP.5:.95 in the repo is larger than AP.5:.95 in pycocotools. My initial guess is that when your project calculates the high IoU thresholds of AP90 and AP95, the calculation is not very accurate. At the same time, it also troubles me.

Zengyf-CVer commented 3 years ago

@glenn-jocher I found the reference code address in the function of calculating AP: https://github.com/ultralytics/yolov5/blob/ba4b79de8be1c4275cf9d04aaaac988446c62a25/utils/metrics.py#L21-L23 I found that the author has a new library of indicator calculation projects: https://github.com/rafaelpadilla/review_object_detection_metrics https://github.com/rafaelpadilla/review_object_detection_metrics/blob/main/src/evaluators/coco_evaluator.py Here is the algorithm for calculating the coco index, I hope it can bring you some help.

greenkarson commented 2 years ago

@glenn-jocher I did this experiment on a custom data set, as shown in the figure: ksnip_20211011-121334

I found that there is a certain gap between the results of repo and third-party pycocotools: AP difference is 5.6%, AP50 difference is 0.7%. AP.5:.95 There is too much difference, the difference is 5.6%.

你好,我用自己的数据集进行评测需要将数据集的图片文件名改成coco这样的000000001.jpg这样的格式么,如果是自定以的文件名会造成Results do not correspond to current coco set,这样就对应不了imageid。请问你这个自定义的数据的文件命名是怎么样的

shenhaibb commented 2 years ago
  • en I calculated mAP@0.5 and mAP@.5:.95 separately on the original image (1360x800), and generated the labels file through the GT label file in the verification set and the previous step verification. So I got different results: mAP@0.5 is 0.972, mAP@.5:.95 is 0.764. There is a big g

so do i!!did you solve it?

Zengyf-CVer commented 2 years ago

@greenkarson 我之前也遇到Results do not correspond to current coco set这样的问题,我给你几个解决办法:

另外,图片名称不需要改成COCO格式,原来是什么就是什么。

Zengyf-CVer commented 2 years ago

@shenhaibb I can share my experience with you:

Therefore, if you are considering a comparison experiment or future research, you can use the COCO format for comparison, because it is uniform for each neural network.

sanchit2843 commented 1 year ago

Is this problem solved? I'm using https://github.com/rafaelpadilla/review_object_detection_metrics to evaluate the output I get from running detect.py on a custom dataset. The difference in mAP is around 11 points.

I also used [fifty one](https://docs.voxel51.com/tutorials/evaluate_detections.html#Evaluate-detections), and the difference in result is almost the same. I'm unsure if I'm doing something wrong with the conversions or if there's a problem with the output of detect.py

Hi, I also tried with pycocotools, the results are even lower. Here is the script I used to convert the ground truth and prediction text files to json files required for pycocotools. Would really appreciate if you can take a look at it.

import os
import json

import argparse
import json
import cv2
import os
import math
from itertools import chain
import random
import numpy as np
from icecream import ic
import multiprocessing as mp
import shutil
import json
from tqdm import tqdm
import imagesize

def yolo_to_coco(
    yolo_txt_path, image_dir, output_zip_path, base_json_path, yolo_detect_txt_path
):

    category_ids_dict_yolo = {0: "a", 2: "b", 1: "c"}
    coco_json = json.load(open(base_json_path))
    category_idx_dict_coco = {v["name"]: v["id"] for v in coco_json["categories"]}
    images_info = []
    annotations_coco = []
    detect_json = []
    total_annotation_count = 1
    for idx, image_name in tqdm(enumerate(os.listdir(image_dir))):
        width, height = imagesize.get(os.path.join(image_dir, image_name))
        images_info.append(
            {
                "id": idx + 1,
                "file_name": image_name,
                "width": width,
                "height": height,
                "license": 0,
                "flickr_url": "",
                "coco_url": "",
                "date_captured": 0,
            }
        )
        if (
            os.path.exists(
                os.path.join(yolo_txt_path, image_name.replace("png", "txt"))
            )
            is False
        ):
            continue
        with open(os.path.join(yolo_txt_path, image_name.replace("png", "txt"))) as f:
            annotations = f.readlines()

        for annotation in annotations:
            # convert yolo bounding boxes to coco
            annotation = annotation.split(" ")
            category_id_coco = category_idx_dict_coco[
                category_ids_dict_yolo[int(annotation[0])]
            ]
            bbox = [float(i) for i in annotation[1:5]]
            x1, y1, w, h = bbox[0], bbox[1], bbox[2], bbox[3]
            x1 = (x1 - w / 2) * width
            y1 = (y1 - h / 2) * height
            w = w * width
            h = h * height
            annotations_coco.append(
                {
                    "id": total_annotation_count,
                    "image_id": idx + 1,
                    "category_id": category_id_coco,
                    "area": w * h,
                    "segmentation": [],
                    "bbox": [x1, y1, w, h],
                    "iscrowd": 0,
                    "attributes": {"occluded": False},
                }
            )
            total_annotation_count += 1
        if (
            os.path.exists(
                os.path.join(yolo_detect_txt_path, image_name.replace("png", "txt"))
            )
            is False
        ):
            continue
        with open(
            os.path.join(yolo_detect_txt_path, image_name.replace("png", "txt"))
        ) as f:
            annotations = f.readlines()
        for annotation in annotations:
            annotation = annotation.split(" ")
            bbox = [float(i) for i in annotation[1:5]]
            x1, y1, w, h = bbox[0], bbox[1], bbox[2], bbox[3]
            x1 = (x1 - w / 2) * width
            y1 = (y1 - h / 2) * height
            w = w * width
            h = h * height
            detect_json.append(
                {
                    "image_id": idx + 1,
                    "category_id": category_idx_dict_coco[
                        category_ids_dict_yolo[int(annotation[0])]
                    ],
                    "bbox": [x1, y1, w, h],
                    "score": float(annotation[5]),
                }
            )
    coco_json["images"] = images_info
    coco_json["annotations"] = annotations_coco
    os.makedirs(os.path.join(output_zip_path, "temp"), exist_ok=True)
    with open(
        os.path.join(output_zip_path, "instances_defaults.json"),
        "w",
    ) as f:
        json.dump(coco_json, f)
    with open(
        os.path.join(output_zip_path, "detect.json"),
        "w",
    ) as f:
        json.dump(detect_json, f)

if __name__ == "__main__":
    yolo_txt_path = "test/labels"
    image_dir = "test/images"
    yolo_detect_txt_path = "/home/ubuntu/detection/runs/detect/conf_thresh_0.2/labels"
    output_zip_path = "result_check1"
    os.makedirs(output_zip_path, exist_ok=True)
    base_json_path = "./instances_default.json"
    yolo_to_coco(
        yolo_txt_path, image_dir, output_zip_path, base_json_path, yolo_detect_txt_path
    )
    from pycocotools.coco import COCO
    from pycocotools.cocoeval import COCOeval

    annType = "bbox"

    cocoGt = COCO(os.path.join(output_zip_path, "instances_defaults.json"))
    cocoDt = cocoGt.loadRes(os.path.join(output_zip_path, "detect.json"))

    imgIds = sorted(cocoGt.getImgIds())

    # running evaluation
    cocoEval = COCOeval(cocoGt, cocoDt, annType)
    cocoEval.params.imgIds = imgIds
    # cocoEval.params.catIds = [3]
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()
glenn-jocher commented 11 months ago

@sanchit2843 We appreciate your detailed report. There are a few key points that may be contributing to the differences:

  1. Data format: Ensure that the data format being used for evaluation matches the format expected by the evaluation library. For COCO evaluation, the data should adhere to the required COCO JSON format.

  2. Bounding box conversion: Double-check the accuracy of the bounding box conversions from the YOLO format to the COCO format, ensuring that the conversions are consistent and correct.

  3. Evaluation procedure: It seems that you are using different evaluation metrics from multiple libraries. Ensure that the evaluation procedure and metric calculations are consistent across all the libraries used, as different libraries may have slightly different implementations, which could lead to varying results.

  4. Detect.py output: Double-check the output generated by detect.py to ensure that the format and content of the predictions align with the expectations of the evaluation libraries.

By carefully reviewing these aspects and ensuring consistency in data formats, bounding box conversions, evaluation procedures, and output formats, it may be possible to narrow down the source of the discrepancies in the results.

If the issue persists, feel free to ask for further support.