No, both can be done independently from each other.
Yes, as long as your training data contains lots of different speakers.
It depends on what you plan to do with the resulting speech turns. If you plan to apply hierarchical agglomerative clustering, you should prefer high purity (>90%).
Hi,
I have two doubts :
Is 'speech activity detection' prerequisite to 'speaker change detection'?
Another question is current approach in pyannote is speaker and content invariant?
In striking balance between purity and coverage, what should be the value of coverage that should be good enough practically?
Regards Ankur