Open bigdan12 opened 9 months ago
Hi, I think it's ok since the speaker encoder indirectly learns to extract speaker identity. I tried other features such as wav2vec2.0, but it was less effective than CPC features. I think using SV features for the speaker encoder can be effective, but the auxiliary task (classification) was not meaningful in my case.
Hi, why not add speaker classification in speaker encoder, or use Speaker Verification feature. If I only use a speaker encoder, will there be any problems with timbral coupling?