pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
5.98k stars 756 forks source link

Can I download a separate audio file for each speaker? #1014

Closed heunggyulee closed 2 years ago

heunggyulee commented 2 years ago

First of all, thank you for providing a good module.

I want to connect speaker diarization of pyannote and STT module that i made. I uploaded speech file and ran the pretrained model and got a diarization result prefectly :) And now, i want to download a separate audio file for each speaker to progress STT module. Is there any way to download mp3 or wav file?!

manish-kumar-iisc commented 2 years ago

As far i know, you cannot download directly. I am not aware of stt module. one way i can suggest is:

  1. Get diarization timestamp boundary(output of pyannote)
  2. using ffmpeg command you can cut the uploaded speech file, according to diariazation timestamp.
  3. there can be possiblity, in which speaker is not continuous, so you get multiple mp3 files for each speaker,
  4. either you can merge file for which speaker is same(using ffmpeg command), or you can keep separate
heunggyulee commented 2 years ago

Thank you for replies.

If speaker1 speaks from 00:03 to 00:10 and speaker2 speaks from 00:07 to 00:15. Two voices will be mixed from 00:07 to 00:10.

But i want to get seperated voice files. file1 : speaker1.mp3 (00:03 \~ 00:10)(without speaker2 voice.) file2 : speaker2.mp3 (00:07 \~ 00:15)(without speaker1 voice.)

Is it impossible in pyannote?!

hbredin commented 2 years ago

No. pyannote does not do speaker separation, it focuses on diarization. For source separation, see asteroid.

heunggyulee commented 2 years ago

If i open Pipeline module. Can i get a speaker separation result?!

hbredin commented 2 years ago

No. Again: pyannote.audio does not do (nor internally rely on) separation.