Closed gen35 closed 5 years ago
Please refer to https://github.com/taylorlu/Speaker-Diarization#dataset you can use either of these datasets. Before training uisrnn model, you should generate every embedding of speakers.
I assume by using these datasets you are concatenating independent utterances. I wonder if it would be better to use smaller datasets with real dialogues. There are some free datasets: https://github.com/wq2012/awesome-diarization#datasets, but I haven't tested them yet.
Yes, the real dialogues should be more suitable to train the model since it considered the overlapping information of adjacent windows. In uis-rnn, the embeddings seem to shuffle each other, you can read the code of uis-rnn for more detail.
Thanks for reply.
How is appropriate to train uisrnn model with this dataset https://github.com/taylorlu/Speaker-Diarization#dataset ? Because there are not speaker changes in this dataset.
Please read the code of https://github.com/taylorlu/Speaker-Diarization/blob/master/ghostvlad/generate_embeddings.py, I just concatenate the utterances of [10,20] speakers after VAD and generate the embeddings of each sliding window one by one. The final training data will contain the speaker change information.
many thanks for reply
Sorry @taylorlu , i would like to clarify a point. If i want to use my dataset, first of all i run generate_embeddings.py, in order to create training_data.npz, and then go on running train.py and speakerDiarization.py. In generate_embeddings.py, i change the path of the dataset, obviously, but i should change also the pretrained/weights.h5, or not?
Thank you in advance
pretrained/weights.h5
has no relationship with dataset once you have trained the ghostvlad model( for speaker recognition ), it supports openset data, so you can use new speakers outside the training dataset for ghostvlad.
Ok, thanks
Hi, I am trying to train my own dataset (a bunch of audio files) but the problem is I am getting errors while running generate_embeddings.py, in order to create a training_data.npz file.
error:
Not a directory: 'Dataset/.nfs000000010433337000000001/*.wav'
updated part:
`def prepare_data(SRC_PATH): wavDir = os.listdir(SRC_PATH) wavDir.sort()
allpath_list = []
allspk_list = []
for i,spkDir in enumerate(wavDir): # Each speaker's directory
spk = spkDir # speaker name
wavPath = os.path.join(SRC_PATH, spkDir, '*.wav')
for wav in os.listdir(wavPath): # wavfile
utter_path = os.path.join(wavPath, wav)
allpath_list.append(utter_path)
allspk_list.append(i)
if(i>100):
break
path_spk_list = list(zip(allpath_list, allspk_list))
return path_spk_list`
it will be great if you can suggest some possible ways to resolve this issue.
What audio data did you use to train uisrnn model?