Open Demon-tk opened 1 year ago
cc @sanchit-gandhi
@Demon-tk If you need a workaround for time being, I was able to make num_speakers, min_speakers, and max_speakers work with following minor change in the diarize.py file -
)
Now, include any of these 3 arguments along with the audio file like this: pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-medium", device=device) out = pipeline(input_vid_path, min_speakers = 2)
Let me know if you have any questions.
@speechbox developers, let me know if you see anything wrong with this workaround. Thanks!
That's a valid workaround - probably what we can do is have specific kwargs for the diarization pipeline, and the asr pipeline
Would you like to open a PR @utility-aagrawal or @Demon-tk to add this support? It would look very similar to specific encoder-decoder kwargs that we have in transformers: https://github.com/huggingface/transformers/blob/dd8b7d28aec80013ad2b25ead4200eea1a6a767e/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py#L458-L464
Thanks @sanchit-gandhi! I can do that for both issues #25 and #27.
@Demon-tk, I have added separate kwargs for asr and diarization pipelines. You should be able to specify number of speakers in the ASRDiarizationPipeline now. Please note that you would need to prefix 'diarization_' to make number of speakers work with diarization pipeline:
pipeline = ASRDiarizationPipeline.from_pretrained("openai/whisper-medium", device=device) out = pipeline(input_vid_path, diarization_num_speakers = 2)
Please close this thread if there are no further questions/issues. Thanks!
Hi @speechbox developers,
I've been using the
ASRDiarizationPipeline
and noticed that there isn't a built-in option to specify the number of speakers when performing diarization. This feature would be very helpful for scenarios where the number of speakers is already known or can be estimated beforehand, as it can potentially improve the performance of the speaker diarization process.