tyiannak / pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Apache License 2.0
5.8k stars 1.19k forks source link

diarization #56

Open mosherayman opened 7 years ago

mosherayman commented 7 years ago

I have been experimenting with diarization of two-party phone calls. I am using real phonecall recordings and ones "assembled" from various publicly available speech corpus data. These are fast paced phone-calls without significant silence between the speakers.

The diarization code almost always get the number of segments right. However, it consistently gets the segment boarders between 0.5 and 1.5 seconds earlier then the ground truth.

Is this expected? What parameters can be adjusted which would impact the segment boarders?

maximveksler commented 7 years ago

I'm working on a similar feature, could you share your code ?

zarquon5 commented 7 years ago

I, too, have noticed that, with a small enough mtSize and mtStep, it's doing a pretty good job of finding segments, but the boundaries are where it suffers; it seems to regularly declare a transition one word or so earlier than it should.) The current speakerDiarization() method has no parameters for changing models; the knnSpeakerAll and knnSpeakerFemaleMale models are hard-coded into the function. Is there any way to, possibly through training, build a better segmenter using ground-truth files, and if so, can someone write up a recipe for how to do this?