inclusion-international / speech-jokey

A speech synthesis software with integration of several TTS APIs, SSML support and optimizations for users with motor impairment. (course ASSIST HEIDI WS2023 and SS2024)
MIT License
1 stars 0 forks source link

Integration with ElevenlabsAPI: Voices sound different over API than on website #5

Open HackXIt opened 1 month ago

HackXIt commented 1 month ago

We actually managed to do a bit of stuff in the last weeks. But we came across some problems... The current working state, that Natascha has now already to test, is the version on branch 'develop_breaks'. We managed to integrate the ElevenlabsAPI, synthesize an audio file and play it. And we made a new button, where you can choose a voice. However, if we generate an Audio, the voices don't sound like they do on the ElevenLabs Website.. Espacially the male voices are not male, but female in a lower tone... Did you have that problem in the old version as well? Or do you maybe have an idea what went wrong?😅

Relevant commit: 846b93a

HackXIt commented 1 month ago

my initial assumption would be that indeed APIs can change and also behave differently than what the provider puts on their website.

You need to keep in mind that the ElevenLabs Machine Learning Model is HEAVILY influenced by what is actually written in the text.

It is a generative model, so there is not much determinism when using it, albeit at the potential of much better & refined output.

If for example you write a text with 'Speak this in a brutal and very manly way', the AI would be inclined to do so, even though the voice chosen is female.

And there can be nuances in writing which have an impact in that regard..

If things like that come across to you, it's always best if you provide your inputs that gave you the results, otherwise it is near-impossible for me to reproduce