jxzhanggg / nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC
MIT License
250 stars 56 forks source link

Is there any method to predict speaker code embedding? #12

Closed youngsuenXMLY closed 4 years ago

youngsuenXMLY commented 4 years ago

The one-hot speaker embedding is simple but is applicable in limited scenarios. Is there any method for universal speaker embedding?

jxzhanggg commented 4 years ago

Yes.
In our method, a speaker encoder is adopted in pre-training stage. After you finish the pre-training, theoretically, you can inference a speaker code by passing Mel-spectrogram through the speaker encoder.

However, as describle in our paper, we introduce speaker embedding during fine-tuning. And output of speaker encoder is only used for initializing the weights of speaker embedding. The reason is that we found this method produced better results. Maybe our speaker encoder is not powerful enough for giving universal speaker embedding. Also, as far as I know, dataset with thousands of speakers often is used for extracting d-vector. Therefore, training data is also an important factor to be taken account of.