jxzhanggg / nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC
MIT License
250 stars 56 forks source link

How much time is the training dataset? #30

Closed youngsuenXMLY closed 4 years ago

youngsuenXMLY commented 4 years ago

Hello, I trained the VCTK, and the training process looks like this image The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances. What if the utterance number is 1 for each speaker?

huukim136 commented 4 years ago

Hello, I trained the VCTK, and the training process looks like this image The VCTK dataset has 100+ speakers, and for every speaker, there are several utterances. What if the utterance number is 1 for each speaker?

Each speaker in VCTK corpus has a few hundreds of samples. I think if you only take one utterance for each speaker then you don't have enough data to train.

youngsuenXMLY commented 4 years ago

What about the situation that we have enough utterances, but only 1 or 2 utterances for a speaker?

jxzhanggg commented 4 years ago

Hi, that's a good question. I haven't try a dataset like this so I don't kown how it'll perform. I suppose it'll have difficulty in speaker classification and speaker adversarial learning because there have to be a super-large softmax output layer for our model, i.e equal to number of speakers.