Machine Learning - Speaker Diarization with Python [Audio Processing] [Deep Learning]

rowhitswami commented 5 years ago

Introduction

Speaker diarization is the process of partitioning an input audio stream into homogeneous segments according to the speaker's identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into the speaker turns and, when used together with speech recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?" Speaker diarization is a combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics.

Task

The motive of this issue is to implement an optimal solution to perform speaker diarization with the transcript of a given file interview.mp3.

Expected Output

Speaker 1 - dummy text dummy text dummy text dummy text

Speaker 2 - dummy text dummy text dummy text dummy text

End-to-end resource

https://github.com/wq2012/awesome-diarization

Choose any technique given in the above-mentioned repository.

srinath1999 commented 5 years ago

Hey, @rowhitswami I want to take up this task! But I need some time to reading the material and write the code... I will give my best to it

rowhitswami commented 5 years ago

Yes, @srinath1999 You can take your time :blush:

kevins99 commented 5 years ago

Would like to take this up!

rowhitswami commented 5 years ago

@kevins99 feel free to work on it :blush:

rowhitswami commented 5 years ago

@srinath1999 @kevins99 any updates?

rowhitswami / Speaker-Diarization-with-Python