rafaelpadilla / review_object_detection_metrics

Object Detection Metrics. 14 object detection metrics: mean Average Precision (mAP), Average Recall (AR), Spatio-Temporal Tube Average Precision (STT-AP). This project supports different bounding box formats as in COCO, PASCAL, Imagenet, etc.
Other
1.08k stars 215 forks source link

Thoughts about optimisations #66

Closed laclouis5 closed 2 years ago

laclouis5 commented 3 years ago

Thanks for this great library which brings more flexibility in the computation of many metrics, this was missing tool!

I quickly ran through the code and had some ideas about ways to optimize some parts of the code.

In my opinion, the BoundingBox class is responsible for too many things. For instance a BoundingBox object shouldn't deal with an image_name or image_size. First, this is highly redundant (all boxes from the same image have the same name and image size) thus taking lots of space, and second, this leaks abstraction and tightly couples a bounding box to a specific image while a bounding box should be image-agnostic.

Coordinates are also redundant, _w and _h can be computed from _x, _x2 and _y, _y2.

Another redundancy is the confidence and bb_type as confidence already carry the the bb_type in its ability to be None. Last, storing type_coordinates and format can be avoided by fixing a unique bounding box format (lets say XYX2Y2 and Absolute).

A draft implementation using __slots__ to avoid memory overhead of Python objects would ressembles this:

class BoundingBox:

  """A bounding box in XYX2Y2 Absolute format."""

    __slots__ = ("label", "xmin", "ymin", "xmax", "ymax", "confidence")

    def __init__(self, label: str, xmin: float, ymin: float, xmax: float, ymax: float, confidence: float = None):
        if confidence: 
            assert 0.0 <= confidence <= 1.0, f"Confidence ({confidence}) should be in 0...1"
        self.label = label
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax
        self.confidence = confidence

    # `bb_type` can be computed instead of stored
    @property
    def bb_type(self) -> BBType:
        return BBType.GROUND_TRUTH if self.confidence is None else BBType.DETECTED

    # Add explicit init for different box format. This simplifies the __init__ and separates logic.
    @staticmethod
    def from_yolo(label, coords, confidence=None) -> "BoundingBox":
      ...

Then, another object would hold informations about one image and its annotations:

class ImageAnnotation:

    def __init__(self, image_path: str, image_size: "tuple[int, int]", boxes: "list[BoundingBox]" = None):
        self.image_path = image_path
        self.image_size = image_size
        self.boxes = boxes or []

    # Again, high-level API for reading an annotation in a specific format
    @staticmethod
    def from_yolo(txt_file: str, img_file: str) -> "ImageAnnotation":
        ...

A "dataset" is then just a sequence of ImageAnnotation.

The additional advantage of such a design is that bounding boxes are already clustered by image, and because many evaluation metrics operate image by image, there is an opportunity to parallelize lots of operations that do not depend on inter-image dependency.

For instance the _evaluate_image(...) function for COCO evaluation could be a method of ImageAnnotation and could be run in parallel for each image.

Last thing (I didn't dive into the details of the evaluators), it seems that there is many inefficient operations such as here in get_pascalvoc_metrics(...):

        # Loop through detections
        for idx_det, det in enumerate(dects):
            img_det = det.get_image_name()
            ...
            # Find ground truth image
            gt = [gt for gt in classes_bbs[c]['gt'] if gt.get_image_name() == img_det]

This last operation may be quadratic because it iterates every ground truth of class c for every detection of class c. This can be avoided by grouping classes_bbc by image_name (also, the design I described would simplifies this a lot).

Or here in a parser (same for textobb):

def openimage2bb(annotations_path, images_dir, bb_type=BBType.GROUND_TRUTH):
    ...
    for file_path in annotation_files:
        ...
        for i, row in csv.iterrows():
            ...
            image_file = general_utils.find_file(images_dir, img_name)

As find_file is worst case O(n) wrt to the number of files in the image directory, this can be quite expensive since it is nested in two loops.

And this utility function is gonna load the whole image in memory just to get its size:

https://github.com/rafaelpadilla/review_object_detection_metrics/blob/49a614a6ca59e39f2aecd085730cc2e4d30cd886/src/utils/general_utils.py#L201-L210

This may be improved with PIL.Image.open(image).size which seems to load image lazily and compute its size from image metadata.

Validators are also very heavy, for instance the function: https://github.com/rafaelpadilla/review_object_detection_metrics/blob/49a614a6ca59e39f2aecd085730cc2e4d30cd886/src/utils/validations.py#L344

does at least 3 whole file opening and reading.

rafaelpadilla commented 3 years ago

Hi @laclouis5 ,

Thank you for your message.

Excellent tips!!! I agree that this redesigning will improve memory consumption and allow parallelization in the evaluation process.

In the moment I am busy with another project. Would you be interested in contributing with us?

Best regards.

laclouis5 commented 3 years ago

Yes sure!

I'm currently writing my PhD thesis so I don't have much time neither unfortunately. The re-design will be quite long, I suggest to first address the biggest bottlenecks (the quadratic computations) so that the framework will be fast enough even for big datasets. I'll take a look when I got some spare time.

However, this will require unit tests to make sure there is no regression and a prior performance profiling on a big dataset (10 000 should be a good starting point) to assert that optimisations are beneficial.

I'll continue editing the first comment of this issue to list other bottlenecks.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.