wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
630 stars 109 forks source link

Do we have quantizer of speaker embedding? #247

Closed Liu-Feng-deeplearning closed 7 months ago

Liu-Feng-deeplearning commented 7 months ago

Hello, do we have plan to support quantizer extractor of speaker embedding, such as RVQ/LFQ?

If quantizers of speaker embedding is possible, it is easy to add integer quantizers as prompt to put before audio token(e.g. valle/soundstream).

Thank you for the excellent project.

wsstriving commented 7 months ago

Currently, no, thanks for the suggestion. Can you detail why we need such kind of embeddings? I believe it's quite easy for LLM-based TTS systems to take continuous speaker embeddings as input, considering its modeling capacity.

wsstriving commented 7 months ago

Closed since no futher discussion is launched. Feel free to re-open it.