Metrics for multilabel problems don't match the expected format.

adamamer20 commented 2 months ago

Issue

Evaluation metrics cannot be used for multilabel classification problems.

Reproducible example

You can find a reproducible snippet here

Problem explanation

The error is given by how the expected format of some metrics has been chosen. For example, for accuracy and f1("average"), f1("micro"), f1("macro"), the expected format is a scalar (Value(dtype='int32', id=None)) and thus breaksdown in a multilabel use (ValueError: Predictions and/or references don't match the expected format.). Apart from the hassle of reshaping predictions and labels, and the confusion to define which indices correspond to the same label and which to the same instance, it's different from how it's done in other libraries. Scikit-learn accepts nested lists in the case of multilabel f1.

Possible solution

Refactor the format of EvaluationModule of accuracy and f1 (+ others...) to also accept Sequence

shenxiangzhuang commented 2 months ago

Hi @adamamer20 , did you try to use f1_metric = evaluate.load("f1", "multilabel")?

You question is similar with #550

adamamer20 commented 2 months ago

Thank you, it worked. I tried searching in the docs but there isn't anything on multilabel.

huggingface / evaluate