Does anyone tried to use d-vector as the speaker embedding instead of the one-hot vector? When I was trying to use d-vector, the utilization of latent embedding vectors is bad - the model converges to use only one embedding vector. Any ideas to solve it?
Hi,
Does anyone tried to use d-vector as the speaker embedding instead of the one-hot vector? When I was trying to use d-vector, the utilization of latent embedding vectors is bad - the model converges to use only one embedding vector. Any ideas to solve it?
Thanks, Burnie