maum-ai / voicefilter

Unofficial PyTorch implementation of Google AI's VoiceFilter system
http://swpark.me/voicefilter
1.09k stars 227 forks source link

the model implementation comprehension #22

Closed kurbobo closed 4 years ago

kurbobo commented 4 years ago

Hello, I'm a master student in ITMO university in Saint-Petersburg, Russia.

Could you explain me please, what exactly this model implemenation do? As for me (variant 1) it takes as input mixed sound of voice of a person A and voice of a person B and clear voice A, the same as in mixed one and trying to extract it from the mixed one. (that is really strange because it is useless) And in the paper (variant 2) it is said that it should take the mixed one and clear voice of the target person but NOT the same sound as in mixed one! And this is the point.

When I tried to look at train test, made by generator, I found out that in every example of **-mixed.wav there is **-target.wav with another voice! (but not another phrase of target person as I thought it should be)

Am I right? Or what's going on here?

Waiting for your answer, thank you!

kurbobo commented 4 years ago

I got it, I'm just blind and haven't found that audio which should be used for making embedding is located in file **-dvec.txt and everything is alright,but never-the-less I have problems with the inference...