Closed laclouis5 closed 2 years ago
Hi @laclouis5 ,
Thank you for your message.
Excellent tips!!! I agree that this redesigning will improve memory consumption and allow parallelization in the evaluation process.
In the moment I am busy with another project. Would you be interested in contributing with us?
Best regards.
Yes sure!
I'm currently writing my PhD thesis so I don't have much time neither unfortunately. The re-design will be quite long, I suggest to first address the biggest bottlenecks (the quadratic computations) so that the framework will be fast enough even for big datasets. I'll take a look when I got some spare time.
However, this will require unit tests to make sure there is no regression and a prior performance profiling on a big dataset (10 000 should be a good starting point) to assert that optimisations are beneficial.
I'll continue editing the first comment of this issue to list other bottlenecks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Thanks for this great library which brings more flexibility in the computation of many metrics, this was missing tool!
I quickly ran through the code and had some ideas about ways to optimize some parts of the code.
In my opinion, the
BoundingBox
class is responsible for too many things. For instance aBoundingBox
object shouldn't deal with animage_name
orimage_size
. First, this is highly redundant (all boxes from the same image have the same name and image size) thus taking lots of space, and second, this leaks abstraction and tightly couples a bounding box to a specific image while a bounding box should be image-agnostic.Coordinates are also redundant,
_w
and_h
can be computed from_x
,_x2
and_y
,_y2
.Another redundancy is the
confidence
andbb_type
asconfidence
already carry the thebb_type
in its ability to beNone
. Last, storingtype_coordinates
andformat
can be avoided by fixing a unique bounding box format (lets sayXYX2Y2
andAbsolute
).A draft implementation using
__slots__
to avoid memory overhead of Python objects would ressembles this:Then, another object would hold informations about one image and its annotations:
A "dataset" is then just a sequence of
ImageAnnotation
.The additional advantage of such a design is that bounding boxes are already clustered by image, and because many evaluation metrics operate image by image, there is an opportunity to parallelize lots of operations that do not depend on inter-image dependency.
For instance the
_evaluate_image(...)
function for COCO evaluation could be a method ofImageAnnotation
and could be run in parallel for each image.Last thing (I didn't dive into the details of the evaluators), it seems that there is many inefficient operations such as here in
get_pascalvoc_metrics(...)
:This last operation may be quadratic because it iterates every ground truth of class
c
for every detection of classc
. This can be avoided by groupingclasses_bbc
byimage_name
(also, the design I described would simplifies this a lot).Or here in a parser (same for
textobb
):As
find_file
is worst caseO(n)
wrt to the number of files in the image directory, this can be quite expensive since it is nested in two loops.And this utility function is gonna load the whole image in memory just to get its size:
https://github.com/rafaelpadilla/review_object_detection_metrics/blob/49a614a6ca59e39f2aecd085730cc2e4d30cd886/src/utils/general_utils.py#L201-L210
This may be improved with
PIL.Image.open(image).size
which seems to load image lazily and compute its size from image metadata.Validators are also very heavy, for instance the function: https://github.com/rafaelpadilla/review_object_detection_metrics/blob/49a614a6ca59e39f2aecd085730cc2e4d30cd886/src/utils/validations.py#L344
does at least 3 whole file opening and reading.