Open TheMattBin opened 1 month ago
This is certainly feasible, as WhisperX offers diarization through their Diarization Pipeline:
diarize_model = DiarizationPipeline(use_auth_token=hf_token, device=device)
for result, input_audio_path in tmp_results:
diarize_segments = diarize_model(input_audio_path, min_speakers=min_speakers, max_speakers=max_speakers)
result = assign_word_speakers(diarize_segments, result)
results.append((result, input_audio_path))
That said, the way an LLM would interpret multiple speakers is unclear to me.
It there any plan to add feature of diarization? Thanks for the great work!