Closed youngsuenXMLY closed 4 years ago
Hello, I trained the VCTK, and the training process looks like this The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances. What if the utterance number is 1 for each speaker?
Each speaker in VCTK corpus has a few hundreds of samples. I think if you only take one utterance for each speaker then you don't have enough data to train.
What about the situation that we have enough utterances, but only 1 or 2 utterances for a speaker?
Hi, that's a good question. I haven't try a dataset like this so I don't kown how it'll perform. I suppose it'll have difficulty in speaker classification and speaker adversarial learning because there have to be a super-large softmax output layer for our model, i.e equal to number of speakers.
Hello, I trained the VCTK, and the training process looks like this The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances. What if the utterance number is 1 for each speaker?