Closed 9crk closed 7 years ago
@9crk, I think the easiest thing to do is to use one of the speaker verification recipes as a starting point. E.g., look at egs/sre10/v1 and try to follow what the run.sh is doing. Training the classification system should be identical to that recipe. The main thing you have to change is how the PLDA model is used. You can extract an ivector for each test cut, and use PLDA scoring to compare it with all of your speaker models. Then pick the one its closest to. If you only have a few people (e.g., ~10) doing this exhaustive search should be fast.
Hello,administrators! It's really awesome you guys make this toolkit.
I'm a new hand of kaldi, and I was attempt making a text-dependent speaker classifier( not verification) which can tell apart 10+ people.
I'v learned about the MFCC&GMM&FFT&cluster stuff, and tried to make one by myself, but it works bad.
and I heard that i-vector could do this job. although I knew about the speech recognition flow, but the kaldi is still so complex for me. I'v just did the yesno example, and it seems no help for me to understand how to use the ivector.
can you give me some advice?