Speeding up mean_iou metric computation

Working with semantic-segmentation example I observed that the mean_iou metric takes quite a significant time for computation (the time is comparable with a training loop).

The cause of such behavior is a conversion of resulted numpy arrays with segmentation maps to dataset format. Currently mean_iou metric supposes all segmentation arrays to be converted to datasets.Sequence(datasets.Sequence(datasets.Value("uint16"))) which means converting every item of the arrays.

This PR aims to speed up the mean_iou by changing the Features type to datasets.Image().

Here is a short script to measure computation time

import time
import numpy as np
import evaluate

image_size = 256
num_images = 100
num_labels = 10

# Prepare some random data
np.random.seed(4215)
references = np.random.rand(num_images, image_size, image_size) * (num_labels - 1)
predictions = np.random.rand(num_images, image_size, image_size) * (num_labels - 1)

references = references.round().astype(np.uint16)
predictions = predictions.round().astype(np.uint16)

# Load the slow and fast implementations
slow_iou = evaluate.load("mean_iou")  # the one from evaluate lib
faster_iou = evaluate.load("./metrics/mean_iou/")  # the local, modified one

# Track the time taken for each implementation
slow_iou_start = time.time()
slow_iou_results = slow_iou.compute(
    predictions=predictions,
    references=references,
    num_labels=num_labels,
    ignore_index=0,
    reduce_labels=False,
)
slow_iou_time = time.time() - slow_iou_start
slow_mean_iou = slow_iou_results["mean_iou"]
print(f"Slow IOU: {slow_mean_iou:.3f} in {slow_iou_time:.2f} seconds")

faster_iou_start = time.time()
faster_iou_results = faster_iou.compute(
    predictions=predictions,
    references=references,
    num_labels=num_labels,
    ignore_index=0,
    reduce_labels=False,
)
faster_iou_time = time.time() - faster_iou_start
faster_mean_iou = faster_iou_results["mean_iou"]
print(f"Faster IOU: {faster_mean_iou:.3f} in {faster_iou_time:.2f} seconds")

# Chech results are the same
assert np.isclose(slow_mean_iou, faster_mean_iou), "IOU values do not match"

# >>> Slow IOU: 0.052 in 11.73 seconds
# >>> Faster IOU: 0.052 in 0.26 seconds

As a result, we get 5-50x speed up in metric computation depending on the number of images, image size, and the number of classes.

P.S. PR also fixes not working example in README for mean_iou (https://github.com/huggingface/evaluate/issues/563).

huggingface / evaluate

Speeding up mean_iou metric computation #569