pyannote / pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
http://pyannote.github.io/pyannote-metrics
MIT License
183 stars 30 forks source link

Question : Which metric for validating a multi-label classifier ? #27

Closed MarvinLvn closed 5 years ago

MarvinLvn commented 5 years ago

Hi Hervé !

I'm using pyannote for building a multi-label classifier whose task is to classify speech utterances. The idea is to output a vector whose : 1) First dimension is for the key child (the one wearing the mic') 2) Second for the other children 3) Third for female speech 4) Fourth for male speech 5) Fifth for overlapping speech (optional)

And I'm wondering about the best way to validate my model.

I came up with 2 solutions :

1) Average (weighted or not) detection error rate across all the classes. 2) Identification error rate

Given that the task I want to solve is neither a diarization task, nor a speech activity detection task, I'd like to have your thoughts on which metric you think would be best suited.

Thanks !

hbredin commented 5 years ago

The problem with identification error rate is that the majority class will outweigh all the other classes in the metric.

You could also look at precision and recall for each class.