Open Xiaobx-lab opened 1 month ago
You need to tune the threshold by yourself.
You need to tune the threshold by yourself.
I'm working on implementing a function for recording meeting transcripts, but I don’t know the exact number of clusters beforehand, and a single cluster threshold doesn’t seem suitable for different audio files. What approach should I take?
I attempted to process a long audio file using '3dspeaker+segmentation.onnx' you provided but noticed a strong decrease in accuracy during speaker diarization when cluster numbers were not specified. I also compared it with the speaker diarization model 3.1 provided by Pyannote on HuggingFace, and it seems to perform better. However, I'm unsure how to deploy that model on Android. Could you please advise me on how to solve this issue?