winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
MIT License
129 stars 12 forks source link

A question about speaker encoder #21

Open bigdan12 opened 4 months ago

bigdan12 commented 4 months ago

Hi, why not add speaker classification in speaker encoder, or use Speaker Verification feature. If I only use a speaker encoder, will there be any problems with timbral coupling?

winddori2002 commented 4 months ago

Hi, I think it's ok since the speaker encoder indirectly learns to extract speaker identity. I tried other features such as wav2vec2.0, but it was less effective than CPC features. I think using SV features for the speaker encoder can be effective, but the auxiliary task (classification) was not meaningful in my case.