pyannote / pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
http://pyannote.github.io/pyannote-metrics
MIT License
184 stars 32 forks source link

Add option to ignore overlapping speech regions #15

Closed Jamiroquai88 closed 7 years ago

Jamiroquai88 commented 7 years ago

Hi, is there some way to ignore overlaping speech in DER scoring? It is quite common to use collar with size 0.25 (I know how to do that), but I can not see option to ignore overlaps. Thanks

hbredin commented 7 years ago

What does "ignoring overlaps" mean exactly?

At least two different ways of doing this come to mind:

I don't have time right now to work on this but I'd gladly consider a pull request adding this feature.

The former is easier to implement than the latter, and could be easily added to UEMSupportMixin: https://github.com/pyannote/pyannote-metrics/blob/develop/pyannote/metrics/utils.py#L35 and all metrics supporting this UEMSupportMixin.

Jamiroquai88 commented 7 years ago

ignoring overlaps means, that you do not evaluate overlapped segments, from md-eval.pl script: to limit scoring to those time regions in which only a single speaker is speaking

hbredin commented 7 years ago

limit scoring to those time regions in which only a single speaker is speaking

Does this mean that regions in which no speaker is speaking are removed as well? In other words, should non-speech regions be discarded as well? This could have a huge impact on files with a lot of non-speech regions...

Jamiroquai88 commented 7 years ago

no, you still score regions with non-speech as a part of DER, you just do not score overlapped regions. Point is, that very common DER setup is with collar of size 0.25 and without scoring overlapped regions, because it is too difficult to detect and assign overlapped segments.

hbredin commented 7 years ago

OK. Got it.

Not sure "being too difficult" is a good justification for adding such a feature, though... On the other hand, the fact that this is a very common DER setup is.

Therefore, I'll try to implement this in next release but cannot provide any ETA.

In the meantime, I would gladly guide you if you'd like to add it yourself. This is (more or less) just a matter of finding overlapping region (using reference.co_iter(reference) for instance) and then extrude those regions from uem.

hbredin commented 7 years ago

A new skip_overlap option has been added. It is available in the develop branch and will be part of next official release.

@Jamiroquai88 I would really appreciate if you could give me some feedback on this.