taylorlu / Speaker-Diarization

speaker diarization by uis-rnn and speaker embedding by vgg-speaker-recognition
Apache License 2.0
464 stars 121 forks source link

About the training data #36

Open soliloquy1983 opened 4 years ago

soliloquy1983 commented 4 years ago

Hi, I am a newbie, and i have two questions: 1) is the path of training data "SRC_PATH" in the generate_embeddings.py/, where the directory indicates the speaker_id 2) Currently, I use only the short dataset like Librispeech (less than 10s) for training. However, the paper uses two off-domain datasets for training: 2000 NIST Speaker Recognition Evaluation and ICSI Meeting Corpus, which are long speech datasets. I am wondering how to use them in the code. Thanks a lot!