Open QQ-777777 opened 1 year ago
Usually, language mismatch degrades the performance of speaker embedding, but it is still available to represent speaker for Chinese speaker embedding extracted by speechbrain/spkrec-xvect-voxceleb
.
If you want to improve the representative of speaker embedding in Chinese dataset, we recommend to retrain/adapt speaker model on your own datasets.
@mechanicalsea what about when extracting speaker embeddings on a multilingual dataset ? i am trying to build a multilingual version of SpeechT5 --> mSpeechT5
@mechanicalsea what about when extracting speaker embeddings on a multilingual dataset ? i am trying to build a multilingual version of SpeechT5 --> mSpeechT5
Take advantage of the pre-trained speaker model, such as ECAPA-TDNN in [Here] and so on (https://github.com/speechbrain/speechbrain), one of which is used in our script of extracting speaker embeddings. Or, to create multilingual / language-independent speaker recognition model.
@mechanicalsea thanks!
Hi @StephennFernandes Wanted to know if you were able to get the speechbrain/spkrec-xvect-voxceleb
embeddings working in your multilingual setting. Is the synthesized speech of good quality without any mechanical artifacts?
@mechanicalsea what about when extracting speaker embeddings on a multilingual dataset ? i am trying to build a multilingual version of SpeechT5 --> mSpeechT5 hi, Stephenn, do you have any progress i am interesting to do the same thing, but on chinese. anyone want to join a share effort ?
Hi, I have the same question as https://github.com/microsoft/SpeechT5/issues/16#issuecomment-1516257038. My training dataset is Chinese, so can i use speechbrain/spkrec-xvect-voxceleb to extract speaker embedding for pre-training?