microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

Does the pre-trained model for hidden unit tokenizer use speaker embeddings? #73

Open Kodhandarama opened 4 months ago

Kodhandarama commented 4 months ago

Can you please elaborate on the role of speaker embeddings in the hidden unit tokenizer and what effect it has?