Open JiangN6 opened 6 months ago
Have you got any solution for improving the diarization accuracy? I am also getting issue on recognising the correct speaker tagging, some time two distinct speakers conversation mixed up and shown under single speaker.
If I set the maximum and minimum number of speaker tags to 2, then the error rate of speaker tag identification will skysoar. It shows that the minutelong part of the audio between two people is recognized as the same speaker tag. If the number of speakers is not set, several speaker tags will be identified. Of course, in addition to these two cases, there will also be common occurrences such as hello, yes and other answers that are not recognized as normal speaker labels
Is there anything you can do to solve the problem or optimize
When the number of speaker tags is not set, four speaker tags 00, 01, 02 and 03 are identified![Dingtalk_20231222161858](https://github.com/m-bain/whisperX/assets/110437979/fa4e2673-9e54-4862-9504-baee5eaf1cbf)
When the maximum and minimum number of speaker tags are set to 2, two speaker tags 00 and 01 are recognized, but the accuracy is too poor for me to accept![Dingtalk_20231222161942](https://github.com/m-bain/whisperX/assets/110437979/66be5830-2e2b-42ad-9675-540dad31b6f8)