taylorlu / Speaker-Diarization

speaker diarization by uis-rnn and speaker embedding by vgg-speaker-recognition
Apache License 2.0
469 stars 121 forks source link

pretrained uisrnn benchmark model #2

Closed gen35 closed 5 years ago

gen35 commented 5 years ago

What audio data did you use to train uisrnn model?

taylorlu commented 5 years ago

Please refer to https://github.com/taylorlu/Speaker-Diarization#dataset you can use either of these datasets. Before training uisrnn model, you should generate every embedding of speakers.

gen35 commented 5 years ago

I assume by using these datasets you are concatenating independent utterances. I wonder if it would be better to use smaller datasets with real dialogues. There are some free datasets: https://github.com/wq2012/awesome-diarization#datasets, but I haven't tested them yet.

taylorlu commented 5 years ago

Yes, the real dialogues should be more suitable to train the model since it considered the overlapping information of adjacent windows. In uis-rnn, the embeddings seem to shuffle each other, you can read the code of uis-rnn for more detail.

gen35 commented 5 years ago

Thanks for reply.

Turan111 commented 5 years ago

How is appropriate to train uisrnn model with this dataset https://github.com/taylorlu/Speaker-Diarization#dataset ? Because there are not speaker changes in this dataset.

taylorlu commented 5 years ago

Please read the code of https://github.com/taylorlu/Speaker-Diarization/blob/master/ghostvlad/generate_embeddings.py, I just concatenate the utterances of [10,20] speakers after VAD and generate the embeddings of each sliding window one by one. The final training data will contain the speaker change information.

Turan111 commented 5 years ago

many thanks for reply

giorgionanfa commented 5 years ago

Sorry @taylorlu , i would like to clarify a point. If i want to use my dataset, first of all i run generate_embeddings.py, in order to create training_data.npz, and then go on running train.py and speakerDiarization.py. In generate_embeddings.py, i change the path of the dataset, obviously, but i should change also the pretrained/weights.h5, or not?

Thank you in advance

taylorlu commented 5 years ago

pretrained/weights.h5 has no relationship with dataset once you have trained the ghostvlad model( for speaker recognition ), it supports openset data, so you can use new speakers outside the training dataset for ghostvlad.

giorgionanfa commented 5 years ago

Ok, thanks

SanaullahOfficial commented 2 years ago

Hi, I am trying to train my own dataset (a bunch of audio files) but the problem is I am getting errors while running generate_embeddings.py, in order to create a training_data.npz file. error: Not a directory: 'Dataset/.nfs000000010433337000000001/*.wav'

updated part:

`def prepare_data(SRC_PATH): wavDir = os.listdir(SRC_PATH) wavDir.sort()

allpath_list = []
allspk_list = []
for i,spkDir in enumerate(wavDir):   # Each speaker's directory
    spk = spkDir    # speaker name
    wavPath = os.path.join(SRC_PATH, spkDir, '*.wav')
    for wav in os.listdir(wavPath): # wavfile
        utter_path = os.path.join(wavPath, wav)
        allpath_list.append(utter_path)
        allspk_list.append(i)
    if(i>100):
        break

path_spk_list = list(zip(allpath_list, allspk_list))
return path_spk_list`

it will be great if you can suggest some possible ways to resolve this issue.