Open arthavmane opened 4 years ago
Hi @arthavmane , Did you find a way to get the d-vectors as I am working on a similar project?
I haven't tried UIS-RNN yet and only found this library yesterday but I can extract the embeds with _, EMBEDS, wav_splits = encoder.embed_utterance(wav, return_partials=True)
@davodesign84 actually this command doesnt output a single 256 element array, the EMBEDS variable will be (#,256) but it should be (1,256). I think its first splitting the audio into segments and then finding embeddings but it should use the entire audio. Any clue how to do that?
Actually I found out @davodesign84 and @zhs105 you can just call embed = encoder.embed_utterance(wav)
and itll give you a (1,256) array which is your embedding for the specific wav file.
I'm working on a project in which I want to use d-vector embeddings to train a model. Can someone please help how to compute d-vectors for different utterances from different speakers to pass into the UISRNN model?