huggingface / evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
https://huggingface.co/docs/evaluate
Apache License 2.0
1.9k stars 235 forks source link

Add COCO evaluation metrics #111

Open NielsRogge opened 3 years ago

NielsRogge commented 3 years ago

I'm currently working on adding Facebook AI's DETR model (end-to-end object detection with Transformers) to HuggingFace Transformers. The model is working fine, but regarding evaluation, I'm currently relying on external CocoEvaluator and PanopticEvaluator objects which are defined in the original repository (here and here respectively).

Running these in a notebook gives you nice summaries like this: image

It would be great if we could import these metrics from the Datasets library, something like this:

import datasets

metric = datasets.load_metric('coco')

for model_input, gold_references in evaluation_dataset:
    model_predictions = model(model_inputs)
    metric.add_batch(predictions=model_predictions, references=gold_references)

final_score = metric.compute()

I think this would be great for object detection and semantic/panoptic segmentation in general, not just for DETR. Reproducing results of object detection papers would be way easier.

However, object detection and panoptic segmentation evaluation is a bit more complex than accuracy (it's more like a summary of metrics at different thresholds rather than a single one). I'm not sure how to proceed here, but happy to help making this possible.

bhavitvyamalik commented 3 years ago

Hi @NielsRogge, I'd like to contribute these metrics to datasets. Let's start with CocoEvaluator first? Currently how are are you sending the ground truths and predictions in coco_evaluator?

NielsRogge commented 3 years ago

Great!

Here's a notebook that illustrates how I'm using CocoEvaluator: https://drive.google.com/file/d/1VV92IlaUiuPOORXULIuAdtNbBWCTCnaj/view?usp=sharing

The evaluation is near the end of the notebook.

bhavitvyamalik commented 3 years ago

I went through the code you've mentioned and I think there are 2 options on how we can go ahead:

1) Implement how DETR people have done this (they're relying very heavily on the official implementation and they're focussing on torch dataset here. I feel ours should be something generic instead of pytorch specific. 2) Do this implementation where user can convert its output and ground truth annotation to pre-defined format and then feed it into our function to calculate metrics (looks very similar to you wanted above)

In my opinion, 2nd option looks very clean but I'm still figuring out how's it transforming the box co-ordinates of coco_gt which you've passed to CocoEvaluator (ground truth for evaluation). Since your model output was already converted to COCO api, I faced little problems there.

NielsRogge commented 3 years ago

Ok, thanks for the update.

Indeed, the metrics API of Datasets is framework agnostic, so we can't rely on a PyTorch-only implementation.

This file is probably want we need to implement.

kadirnar commented 1 year ago

Hi @lvwerra

Do you plan to add a 3rd party application for the COCO map metric?

roboserg commented 11 months ago

Is there any update on this? What would be the recommended way of doing COCO eval with Huggingface?

NielsRogge commented 11 months ago

Yes there's an update on this. @rafaelpadilla has been working on adding native support for COCO metrics in the evaluate library, check the Space here: https://huggingface.co/spaces/rafaelpadilla/detection_metrics. For now you have to load the metric as follows:

import evaluate

evaluator = evaluate.load("rafaelpadilla/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")

but this one is going to be integrated in the main evaluate library.

This is then leveraged to create the open object detection leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.

rafaelpadilla commented 11 months ago

Yep, we intend to integrate to evaluate library.

Meanwhile you can use from here https://huggingface.co/spaces/rafaelpadilla/detection_metrics

Update: the code with the evaluate AP metric and its variations was transferred to https://huggingface.co/spaces/hf-vision/detection_metrics

maltelorbach commented 7 months ago

Hi, running

import evaluate
evaluator = evaluate.load("hf-vision/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")

results in the following error:

ImportError: To be able to use hf-vision/detection_metrics, you need to install the following dependencies['detection_metrics'] using 'pip install detection_metrics' for instance'

How do I load the metric from the hub? Do I need to download the content of that repository manually first?

I'm running evaluate==0.4.1.

sushil-bharati commented 5 months ago

Ran into the same issue @maltelorbach posted on 12/14/2023

sklum commented 2 weeks ago

I spent some time digging into this. The issue is that the hf-vision/detection_metrics metric uses a local module for some coco related dependencies (that's called detection_metrics, which is why you get the ImportError of that flavor). I tried to restructure the space to have a flat directory structure, but then ran into this https://github.com/huggingface/evaluate/issues/189 because certain dependencies aren't loaded (or downloaded), or something. Gave up after that. It seems informative that the object detection example just rolls its own metric code with torchmetrics so it's probably easiest to do that.

NielsRogge commented 2 weeks ago

Yes for now we switched to using Torchmetrics as it already provides a performant implementation with support for distributed training etc. so no need to duplicate it. cc @qubvel