microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

Pretrain SpeechT5 on my own dataset #38

Closed hungker closed 1 year ago

hungker commented 1 year ago

If i want to pretrain SpeechT5 on my own dataset, how can i get the xvector ?

mechanicalsea commented 1 year ago

The released pre-trained model is used to extract the xvector as this way in #16.

hungker commented 1 year ago

Thank you very much. By the way, how to use the SPM model? I didn't see the tutorial

Ajyy commented 1 year ago

You can refer to sentencepiece git repo for more details about SPM model.