Closed hhd52859 closed 4 months ago
There is no direct relationship between the two as the second one needs an extra clustering step used to stitch chunks together.
You cannot. That being said, DER is the sum of three terms (false alarm, missed detection, speaker confusion). Better local FA or MD will directly translate into better global FA or MD. Global SC, on the other hand, cannot be infered from local SC (because of the extra clustering step mentioned above).
This paper might give you a better understanding of the whole pipeline.
@hbredin, thank you for your reply!
After reading the pyannote diarization pipeline paper, I've got better understanding about this problem. But still something is unclear to me. If I set the duration
parameter to a value greater than the maximum duration of all files in the dataset, and omit the second clustering stage, will the resulting DiscreteDER be equivalent to the standard DER? In this case clustering isn't necessary to stitch chunks together anymore.
Additionally, I came across a different DER calculation method on page 13 of this paper, which diverges from both DiscreteDER and the DER used in pyannote. This has led to some confusion on my end about how to achieve comparable DER metrics across different methods. Could you shed some light on this?
If I set the
duration
parameter to a value greater than the maximum duration of all files in the dataset, and omit the second clustering stage, will the resulting DiscreteDER be equivalent to the standard DER?
Yes.
Additionally, I came across a different DER calculation method on page 13 of this paper, which diverges from both DiscreteDER and the DER used in pyannote. This has led to some confusion on my end about how to achieve comparable DER metrics across different methods. Could you shed some light on this?
This is very similar (if not identical) to DiscreteDER
.
Tested versions
3.3.1
System information
Ubuntu 20.04
Issue description
When fine-tuning the PyanNet model, I observed that it calculates the DiscreteDiarizationErrorRate (DiscreteDER) on the validation set during training. However, this metric is assessed only at the chunk level, leading to discrepancies between the local DiscreteDER and the overall DiarizationErrorRate (DER). For instance, using a duration of 5 seconds resulted in a local DiscreteDER of 16% on the AMI development set, whereas the global DER was 17%. However, increasing the duration to 20 seconds leads to 20% DiscreteDER and 15.5% DER.
This observation raises several questions:
Minimal reproduction example (MRE)
Can provide if needed