Is there any method to predict speaker code embedding?

Yes.
In our method, a speaker encoder is adopted in pre-training stage. After you finish the pre-training, theoretically, you can inference a speaker code by passing Mel-spectrogram through the speaker encoder.

However, as describle in our paper, we introduce speaker embedding during fine-tuning. And output of speaker encoder is only used for initializing the weights of speaker embedding. The reason is that we found this method produced better results. Maybe our speaker encoder is not powerful enough for giving universal speaker embedding. Also, as far as I know, dataset with thousands of speakers often is used for extracting d-vector. Therefore, training data is also an important factor to be taken account of.

jxzhanggg / nonparaSeq2seqVC_code

Is there any method to predict speaker code embedding? #12