Closed OpenMachinesAI closed 1 month ago
Yes, please. I looked everywhere to find one.
From this Reddit discussion: https://www.reddit.com/r/LocalLLaMA/comments/1encx98/improved_text_to_speech_model_parler_tts_v1_by/
Laura Gary Jon Lea Karen Rick Brenda David Eileen Jordan Mike Yann Joy James Eric Lauren Rose Will Jason Aaron Naomie Alisa Patrick Jerry Tina Jenna Bill Tom Carol Barbara Rebecca Anna Bruce Emily
Hey, the previous list is indeed correct! However, I've realized that the models were better at some speakers, namely:
Large - Top 20:
Will 0.906055
Eric 0.887598
Laura 0.877930
Alisa 0.877393
Patrick 0.873682
Rose 0.873047
Jerry 0.871582
Jordan 0.870703
Lauren 0.867432
Jenna 0.866455
Karen 0.866309
Rick 0.863135
Bill 0.862207
James 0.856934
Yann 0.856787
Emily 0.856543
Anna 0.848877
Jon 0.848828
Brenda 0.848291
Barbara 0.847998
Mini - Top 20:
Jon 0.908301
Lea 0.904785
Gary 0.903516
Jenna 0.901807
Mike 0.885742
Laura 0.882666
Lauren 0.878320
Eileen 0.875635
Alisa 0.874219
Karen 0.872363
Barbara 0.871509
Carol 0.863623
Emily 0.854932
Rose 0.852246
Will 0.851074
Patrick 0.850977
Eric 0.845459
Rick 0.845020
Anna 0.844922
Tina 0.839160
Would you like to add all of these information in the repo somewhere? If so, feel free to open a PR!
What are the numbers you've included (I'm guessing might be WER, generation speed, or some other accuracy measure)? The list of names is already here: examples/prompt_creation/speaker_ids_to_names.json
Numbers represent average speaker similarity between random snippet of the person speaking and randomly Parler-generated snippet. The higher, the better the model is being able to keep voice consistency. Numbers are from this dataset for Mini and this dataset for Large.
@ylacombe, How is the similarity score calculated? Did you use a specific speaker embedding model to obtain the similarity score?
just want a list