huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
4.62k stars 470 forks source link

any list of all 36 voices? #95

Closed OpenMachinesAI closed 1 month ago

OpenMachinesAI commented 3 months ago

just want a list

LuckyMcBeast commented 3 months ago

Yes, please. I looked everywhere to find one.

Yazorp commented 3 months ago

From this Reddit discussion: https://www.reddit.com/r/LocalLLaMA/comments/1encx98/improved_text_to_speech_model_parler_tts_v1_by/

Laura Gary Jon Lea Karen Rick Brenda David Eileen Jordan Mike Yann Joy James Eric Lauren Rose Will Jason Aaron Naomie Alisa Patrick Jerry Tina Jenna Bill Tom Carol Barbara Rebecca Anna Bruce Emily

ylacombe commented 3 months ago

Hey, the previous list is indeed correct! However, I've realized that the models were better at some speakers, namely:

Large - Top 20:

Will       0.906055
Eric       0.887598
Laura      0.877930
Alisa      0.877393
Patrick    0.873682
Rose       0.873047
Jerry      0.871582
Jordan     0.870703
Lauren     0.867432
Jenna      0.866455
Karen      0.866309
Rick       0.863135
Bill       0.862207
James      0.856934
Yann       0.856787
Emily      0.856543
Anna       0.848877
Jon        0.848828
Brenda     0.848291
Barbara    0.847998

Mini - Top 20:

Jon        0.908301
Lea        0.904785
Gary       0.903516
Jenna      0.901807
Mike       0.885742
Laura      0.882666
Lauren     0.878320
Eileen     0.875635
Alisa      0.874219
Karen      0.872363
Barbara    0.871509
Carol      0.863623
Emily      0.854932
Rose       0.852246
Will       0.851074
Patrick    0.850977
Eric       0.845459
Rick       0.845020
Anna       0.844922
Tina       0.839160

Would you like to add all of these information in the repo somewhere? If so, feel free to open a PR!

dgm3333 commented 3 months ago

What are the numbers you've included (I'm guessing might be WER, generation speed, or some other accuracy measure)? The list of names is already here: examples/prompt_creation/speaker_ids_to_names.json

ylacombe commented 3 months ago

Numbers represent average speaker similarity between random snippet of the person speaking and randomly Parler-generated snippet. The higher, the better the model is being able to keep voice consistency. Numbers are from this dataset for Mini and this dataset for Large.

kdcyberdude commented 1 month ago

@ylacombe, How is the similarity score calculated? Did you use a specific speaker embedding model to obtain the similarity score?

ylacombe commented 1 month ago

Closed by https://github.com/huggingface/parler-tts/pull/141 !