sfortis / openai_tts

OpenAI TTS custom component for HA
GNU General Public License v3.0
60 stars 17 forks source link

[Feature] Custom endpoint URL #16

Closed RedHooman closed 2 months ago

RedHooman commented 3 months ago

Hello,

I'm trying to get https://github.com/ther3zz/TTS working with home assistant (specifically this fork, as it supports multi-speaker models/xTTSv2). I've spent hours trying to bodge MaryTTS to work with this, but it has not been successful and I've given up trying (even tried making a proxy via a PHP Laravel app, but MaryTTS seems broken and isn't POSTing the actual text field?)

I was wondering if openai_tts could be modified to provide the option to specify a custom host that provides an OpenAI compatible endpoint. Alternatively, how much work would it be to create a separate project specifically for Coqui TTS? I'd be willing to donate for getting this working as I want all my AI running locally. I pretty much just want to be able to point HA to Coqui TTS -> Give it the endpoint URL and speaker settings -> Have it work.

This is how I've been running the forked TTS:

docker-compose.yml

    coqui-ai-tts:
        build: ./TTS
        container_name: tts
        restart: no
        environment:
            - COQUI_TOS_AGREED=1
        # To run the model, use the below entrypoint
        entrypoint: /bin/bash -c 'python3 TTS/server/server.py --list_models && python3 TTS/server/server.py --model_path /root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/model.pth --config_path /root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json --use_cuda true'

        # To download the model, use the below entrypoint
        #entrypoint: /bin/bash -c 'python3 TTS/server/server.py --list_models && python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2 --use_cuda true'
        volumes:
            - ./volumes/tts_models:/app/tts_models
            - ./volumes/tts:/root/.local/share/tts
        ports:
            - 5002:5002
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          device_ids: ['0']
                          capabilities: [gpu]

Once the TTS server has been started, a request to the server has the following parameters:

GET /api/tts?text=SOMETEXTHERE&speaker_id=Claribel%20Dervla&style_wav=&language_id=en

Parameters:

text
speaker_id
style_wav
language_id

I don't think there is an API endpoint to return the list of supported languages or speakers, but perhaps as a temporary means, this can be manually specified until an endpoint is added upstream?

Thanks!

qJake commented 3 months ago

+1 for this, there are a lot of open-source projects popping up that have "OpenAI-compatible" API endpoints for TTS - if you could let us override the host and port under some advanced settings, that would be awesome!

qJake commented 3 months ago

I ended up forking this yesterday and got it to work in Home Assistant, but I don't intend on maintaining the repository long-term.

The changes that would be necessary to support an "OpenAI TTS-Compatible" endpoint are:

If you want to try my fork out, it should work side-by-side with this repo since I changed the entity IDs, so load this as a custom repo in HACS and give it a try:

https://github.com/qJake/openai_tts

Unfortunately I don't have enough HA/Python development experience to know how to get these changes back into this main repo while supporting both OpenAI itself and a custom endpoint.

ther3zz commented 3 months ago

I ended up forking this yesterday and got it to work in Home Assistant, but I don't intend on maintaining the repository long-term.

The changes that would be necessary to support an "OpenAI TTS-Compatible" endpoint are:

  • Allow custom URL (or just custom hostname/IP and port number)
  • Don't make API Key required

If you want to try my fork out, it should work side-by-side with this repo since I changed the entity IDs, so load this as a custom repo in HACS and give it a try:

https://github.com/qJake/openai_tts

Unfortunately I don't have enough HA/Python development experience to know how to get these changes back into this main repo while supporting both OpenAI itself and a custom endpoint.

I would look into allowing for a custom model to be specified as well

qJake commented 3 months ago

I would look into allowing for a custom model to be specified as well

By model, do you mean speaker?

Yes, currently the AllTalk v2 Beta that I'm using supports the OpenAI API as a drop-in replacement, in which it has support for mapping an xTTS voice to one of the 6 supported OpenAI voices:

image

This suited my needs enough - I don't need more than 6 distinct voices.

However, yes, you are correct - generally speaking, it would be nice to have a customizable field for speaker and/or be able to change that on the fly as part of the assistant configuration rather than having to create multiple integrations for multiple speakers.

ther3zz commented 3 months ago

I would look into allowing for a custom model to be specified as well

By model, do you mean speaker?

Yes, currently the AllTalk v2 Beta that I'm using supports the OpenAI API as a drop-in replacement, in which it has support for mapping an xTTS voice to one of the 6 supported OpenAI voices:

image

This suited my needs enough - I don't need more than 6 distinct voices.

However, yes, you are correct - generally speaking, it would be nice to have a customizable field for speaker and/or be able to change that on the fly as part of the assistant configuration rather than having to create multiple integrations for multiple speakers.

I submitted a PR which sets the custom_value to true. That basically would allow more flexibility.

raldone01 commented 2 months ago

@qJake I took the liberty to use your code as a base to add custom endpoint support. I also opened a pull request.

@ther3zz I also added your change in my pr. (You should keep yours open too though.)

Do you remember why you changed mp3 to wav?

qJake commented 2 months ago

Do you remember why you changed mp3 to wav?

@raldone01 I found that most of the OpenAI-compatible open-source projects (like AllTalk) default to .wav so it was easier to change there. However, AllTalk does support audio transcoding (not sure what performance penalty this incurs, if any, though) -

image

On this front, we have three options I believe:

  1. Offer a dropdown, default to mp3, let the user choose the expected filetype that's coming from the custom API
  2. Detect the filetype automatically within the extension (probably difficult but maybe not?) or
  3. Leave it hardcoded to mp3 and let custom API users handle transcoding
raldone01 commented 2 months ago

I think wav is most compatible. I will leave this feature out in order to keep the pr as minimal as possible.