srvk / DiViMe

ACLEW Diarization Virtual Machine
Apache License 2.0
32 stars 9 forks source link

Rethink the evaluation pipeline #108

Closed MarvinLvn closed 5 years ago

MarvinLvn commented 5 years ago

Following Alex's advices, I just read the following paper :

https://github.com/pyannote/pyannote-metrics/blob/master/docs/pyannote-metrics.pdf

presenting pyannote.metrics, a toolkit for evaluating speaker diarization systems. It gets basically rid of a lot of things I don't like in the current pipeline :

  1. The fact that the diarization evaluation relies only on the DER. The first step of this metric consists of computing an optimal one-to-one mapping. But, it raises some questions : what happens when the number of classes is different in the reference & hypothesis file ? what are the impact on the metric when the one-to-one mapping fails ? More than that, in some cases we're wasting information that we have : when we know the mapping between the reference and hypothesis classes, we want to use it. No needs to compute it.

  2. Diagnostic capabilities : currently, there are none. What kinds of error our model does ? Does it over-segment (many short clusters) the audio, as LENA seems to do (even though we don't have any metrics to prove it) ? Does it under-segment (few long clusters) it ? On which specific speaker our model fails ? Those are many interesting question that can help solving a specific problem (for improving or assessing a model)

  3. The wall between diarization vs speech activity detection evaluation. The SAD and the diarization tasks are similar and the difference in the evaluation pipeline should only appear at the end (in the metrics choice).

MarvinLvn commented 5 years ago

Example of diarization evaluation output :

diarization_eval

All metrics that have been integrated so far :

lena_ber_5750_030120_030240

--> Still needs to be improved (I tried to set the segments transparency for an hour without succeeding ! T_T)

MarvinLvn commented 5 years ago

Well, I think it's done! I close this issue.