resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.67k stars 419 forks source link

d-Vectors for UIS-RNN #29

Open arthavmane opened 4 years ago

arthavmane commented 4 years ago

I'm working on a project in which I want to use d-vector embeddings to train a model. Can someone please help how to compute d-vectors for different utterances from different speakers to pass into the UISRNN model?

zhs105 commented 3 years ago

Hi @arthavmane , Did you find a way to get the d-vectors as I am working on a similar project?

davide-scalzo commented 3 years ago

I haven't tried UIS-RNN yet and only found this library yesterday but I can extract the embeds with _, EMBEDS, wav_splits = encoder.embed_utterance(wav, return_partials=True)

saumyaborwankar commented 3 years ago

@davodesign84 actually this command doesnt output a single 256 element array, the EMBEDS variable will be (#,256) but it should be (1,256). I think its first splitting the audio into segments and then finding embeddings but it should use the entire audio. Any clue how to do that?

saumyaborwankar commented 3 years ago

Actually I found out @davodesign84 and @zhs105 you can just call embed = encoder.embed_utterance(wav) and itll give you a (1,256) array which is your embedding for the specific wav file.