Closed RedHooman closed 2 months ago
+1 for this, there are a lot of open-source projects popping up that have "OpenAI-compatible" API endpoints for TTS - if you could let us override the host and port under some advanced settings, that would be awesome!
I ended up forking this yesterday and got it to work in Home Assistant, but I don't intend on maintaining the repository long-term.
The changes that would be necessary to support an "OpenAI TTS-Compatible" endpoint are:
If you want to try my fork out, it should work side-by-side with this repo since I changed the entity IDs, so load this as a custom repo in HACS and give it a try:
https://github.com/qJake/openai_tts
Unfortunately I don't have enough HA/Python development experience to know how to get these changes back into this main repo while supporting both OpenAI itself and a custom endpoint.
I ended up forking this yesterday and got it to work in Home Assistant, but I don't intend on maintaining the repository long-term.
The changes that would be necessary to support an "OpenAI TTS-Compatible" endpoint are:
- Allow custom URL (or just custom hostname/IP and port number)
- Don't make API Key required
If you want to try my fork out, it should work side-by-side with this repo since I changed the entity IDs, so load this as a custom repo in HACS and give it a try:
https://github.com/qJake/openai_tts
Unfortunately I don't have enough HA/Python development experience to know how to get these changes back into this main repo while supporting both OpenAI itself and a custom endpoint.
I would look into allowing for a custom model to be specified as well
I would look into allowing for a custom model to be specified as well
By model, do you mean speaker?
Yes, currently the AllTalk v2 Beta that I'm using supports the OpenAI API as a drop-in replacement, in which it has support for mapping an xTTS voice to one of the 6 supported OpenAI voices:
This suited my needs enough - I don't need more than 6 distinct voices.
However, yes, you are correct - generally speaking, it would be nice to have a customizable field for speaker and/or be able to change that on the fly as part of the assistant configuration rather than having to create multiple integrations for multiple speakers.
I would look into allowing for a custom model to be specified as well
By model, do you mean speaker?
Yes, currently the AllTalk v2 Beta that I'm using supports the OpenAI API as a drop-in replacement, in which it has support for mapping an xTTS voice to one of the 6 supported OpenAI voices:
This suited my needs enough - I don't need more than 6 distinct voices.
However, yes, you are correct - generally speaking, it would be nice to have a customizable field for speaker and/or be able to change that on the fly as part of the assistant configuration rather than having to create multiple integrations for multiple speakers.
I submitted a PR which sets the custom_value to true. That basically would allow more flexibility.
@qJake I took the liberty to use your code as a base to add custom endpoint support. I also opened a pull request.
@ther3zz I also added your change in my pr. (You should keep yours open too though.)
Do you remember why you changed mp3
to wav
?
Do you remember why you changed
mp3
towav
?
@raldone01 I found that most of the OpenAI-compatible open-source projects (like AllTalk) default to .wav so it was easier to change there. However, AllTalk does support audio transcoding (not sure what performance penalty this incurs, if any, though) -
On this front, we have three options I believe:
mp3
and let custom API users handle transcodingI think wav
is most compatible. I will leave this feature out in order to keep the pr as minimal as possible.
Hello,
I'm trying to get https://github.com/ther3zz/TTS working with home assistant (specifically this fork, as it supports multi-speaker models/xTTSv2). I've spent hours trying to bodge MaryTTS to work with this, but it has not been successful and I've given up trying (even tried making a proxy via a PHP Laravel app, but MaryTTS seems broken and isn't POSTing the actual text field?)
I was wondering if openai_tts could be modified to provide the option to specify a custom host that provides an OpenAI compatible endpoint. Alternatively, how much work would it be to create a separate project specifically for Coqui TTS? I'd be willing to donate for getting this working as I want all my AI running locally. I pretty much just want to be able to point HA to Coqui TTS -> Give it the endpoint URL and speaker settings -> Have it work.
This is how I've been running the forked TTS:
docker-compose.yml
Once the TTS server has been started, a request to the server has the following parameters:
Parameters:
I don't think there is an API endpoint to return the list of supported languages or speakers, but perhaps as a temporary means, this can be manually specified until an endpoint is added upstream?
Thanks!