Closed liutaocode closed 2 years ago
Not sure which one is standard given than pyannote.metrics
was released before dscore
.
Both were released after NIST's md-eval.pl
. Also, here is a nice post by @desh2608 comparing those tools and introducing yet another one (spyder
): https://desh2608.github.io/2021-03-05-spyder/
Regarding the collar convention used in pyannote.metrics
, this is documented here:
https://pyannote.github.io/pyannote-metrics/reference.html#evaluation-metrics
That being said, I recommend you use collar = 0.0
when reporting results.
For very dynamic conversations with lots of short speech turns, using a 250ms
collar may actually remove more than half of the conversation (and usually the more difficult half) -- leading to over-optimistic reported diarization error rates.
Thanks for your quickly reply ~ I have learnt a lot from your reply. The way pyannote calculates collar is different from dscore. We can set the collar to 2 * collar to keep consistency with the dscore.
Recently, I use the pyannote metric using a collar with 0.25 ms. But I found that the result by pyannote is different from another publicly used evaluation tool called dscore.
I think this is caused by the different meanings in the collar. In pyannote, the collar is used to cut the beginning and end of the segment in half of the collar size. But in dscore, the collar is used to cut the beginning and end of the segment in full size of the collar.
So, I want to ask why pyannote uses a different way to use the collar, which quite confuse me.