laclouis5 / globox

A package to read and convert object detection datasets (COCO, YOLO, PascalVOC, LabelMe, CVAT, OpenImage, ...) and evaluate them with COCO and PascalVOC metrics.
MIT License
182 stars 23 forks source link

Default `relative` value for yolo_v5 doesn't reflect detections from yolo_v5 #49

Open topherbuckley opened 1 month ago

topherbuckley commented 1 month ago

Describe the bug Keeping this definition here as I find the word "relative" ambiguous. As written in the source here

relative (between 0 and 1) to absolute (pixels)

yolov5 models appear to return absolute values, whereas datasets exported in yolov5-pytorch from roboflow are given in relative coords.

AnnotationSet.from_yolo_v5 should specify or at least allow the specification for the default relative=False else it will wrongly be set to True here when run on yolov5 inference detections, but correctly when run on groundtruth datasets exported by roboflow in the yolov5-pytorch format.

To Reproduce

import torch
model = torch.hub.load("ultralytics/yolov5", "custom", path=model_path)
image = 'dataset/test/images/image.jpg'
results = model(image)
print(results.pandas().xywh[0]) # Note absolute coords
filename = results.files[0]
# Remove .jpg extension
filename = filename[:-4]
# Remove 'name' column and reorder to expected order
df = results.pandas().xywh[0][["class", "xcenter", "ycenter", "width", "height", "confidence"]]
output_file = f"detections/{filename}.txt"
df.to_csv(output_file, sep=" ", index=False, header=False)
predictions = AnnotationSet.from_yolo_v5( folder="./detections", image_folder="./dataset/test/images")
print(predictions.get(results.files[0]) #Note bizarrely high values for coords

Expected behavior I expected the default value for both detections from yolo_v5 and datasets downloaded from places like roboflow to have correct default values for relative such that coords are converted only when necessary to absolute ones.

Environment (please complete the following information): Ubuntu 22.04 python 3.10.12 globbox 2.4.6 yolov5 (repo on master currently at HASH a3555241)

Additional context As per this discussion which goes into great detail on how I determined this to be the case:

laclouis5 commented 1 month ago

Hi @topherbuckley,

Unfortunately, the YoloV5 annotation format has no solid specification and may be implemented differently by different organisation. Yolo annotation format are generally very inconsistent contrary to other more stable formats (COCO, CVAT).

I don't think it would make sense to add the relative parameter to the AnnotationSet.from_yolo_v5 method as this would basically make it almost a duplicate of the more generalist AnnotationSet.from_txt method.

Here is two options for your use case:

topherbuckley commented 1 month ago

Thanks for your reply. (And maintaining this!)

Unfortunately, the YoloV5 annotation format has no solid specification and may be implemented differently by different organisation.

May I ask where did the decision to use the current default of relative=True come from? Roboflow datasets? Just curious what was used as the "standard" here since it appears the yolov5 Detections source code was not.

I don't think it would make sense to add the relative parameter to the AnnotationSet.from_yolo_v5 method as this would basically make it almost a duplicate of the more generalist AnnotationSet.from_txt method.

Here is what I had to set other than relative=True:

    predictions = AnnotationSet.from_txt(
        folder="./detections",
        image_folder="./dataset/test",
        box_format=BoxFormat.XYWH, #otherwise defaults to LTRB 
        relative=False,
        conf_last=True, #otherwise defaults to False
    )

In the end its not a huge deal either way now that I have it implemented, but I'd argue from other user's perspective that having to follow through the source to track down and find the defaults and override 3 of the defaults for the from_txt one is more difficult than overriding 1 default in the from_yolo_v5 when they know that's what they need to change.

Is there any documentation that shows what all the exposed and non-exposed parameters resolve to by the time they get to BoundingBox.from_txt? That would save people from having to trace it through the source at least. Might be easy enough to write a test script to log them and parse to a doc. Or maybe this is included in the verbose output?

laclouis5 commented 1 month ago

@topherbuckley As far as I remember, the first YOLO annotation format was proposed by the Darknet framework (YOLO v1 to YOLO v3 era) and the coordinates were relative to the image size. This was for ground truths annotations at least. For predicted annotations the confidence was stored before the coordinates and just after the label id but I remember that the coordinates were still relative to the image size.

Things changed with the Ultralytics era of YOLO models (v5 to v7 and above).The confidence score for predictions is stored in last position in the annotation file. For the coordinates, I remember seing both relative and absolute forms, but I don't remember well, thus I sticked to relative ones.

There are two possible solutions:

In addition, I never wrote a complete documentation, I can improve it for sure!

topherbuckley commented 1 month ago

Thanks for the background info. As I mentioned in the related discussion I'm happy to submit a PR for the latter of the two options, but can only promise knowledge on the ultralytics v5 results rather than any other variant you have built in here.

I don't have enough experience with this to know if this is an actual issue or not, but the first option may also be useful in its own right so users have the option to pull the detection either from disk or from memory depending on the size of the detection set. It may be favorable one way or another depending on the computer being used (i.e. memory limitations). Just a thought; for now happy to implement the from_yolo_result for the time being.

laclouis5 commented 1 month ago

In general if you want to implement a new parser you have to implement the whole stack:

Implementing the whole stack allows more flexibility.

topherbuckley commented 1 month ago
  • from_yolov7_result

Guess this is a typo, but otherwise understood. Thanks I'll open a PR when ready. Do you have any tests specific to these that I should be aware of?

laclouis5 commented 1 month ago

Yes its a typo, though I think that YOLOv5 and YOLOv7 use the same format.

If you want to add some tests, you can also submit a PR to globox_test_data to add some YOLOv5 results annotation that will be used in the tests. In this data repo, you can put the detection files in the dets subfolder. Please note that the tests checks if all the detections of that folder are equivalent (same images, same number of boxes, etc.), thus, you should add predictions that matches the ones already present in that folder.

The most simple way to achieve that is to also create method to export annotations to your new format (BoundingBox.to_yolo_v5 and so on). You can then convert an existing dataset (for instance dets/abs_ltrb) to your new format and manually check that it is correct.

You can then implement parsing and conversion tests, respectively in tests/test_parsing and tests/test_conversion. This should be quite straightforward, just add the relevant dataset path in constants.py and then add the detections to the dets_sets variable in both tests.

topherbuckley commented 1 month ago

Thanks for the info on tests. I'll take a closer look at that sometime next week.

Just had a quick look through BoundingBox with regards to :

BoundingBox.from_yolov5_result, which can be implemented using BoundingBox.from_txt

Do you want me to avoid using BoundingBox.from_yolo and create BoundingBox.from_yolo_result for other potential future result parsers and to follow your current pattern? Or implement the [BoundingBox.from_txt] directly into BoundingBox.from_yolo5_result?

Edit:

The reason I ask is that the from_yolo uses relative=True, which I can get from the yolo5 Detections.xywhn object, and parsing the needed image_size is also possible, but it would be less straightforward via manipulating the tuple representing the Detections.ims[0].shape, than just using the relative=False and neglecting image_size

laclouis5 commented 2 weeks ago

Sorry for the delay, you can implement whichever your think is more convenient. I personally thing that implementing everything from .from_txt is the most straightforward way of doing it.